Rikk03 wrote: ↑15 Nov 2022, 02:36
I find I get more accurate results (on regex101) with the slight modifications below
(^\R?|\R\R)((.(?!\R\R))*\b" . insearch . "\b(?-1)*?\V?)(\.\R\R|\R?$) (using isg modifiers)
but it returns nothing at all when using AHK. So confusing.
UPDATE, on Regex101 I get even better results with the following
(^\R?|\R)((.(?!\R))+\b" . insearch . "\b(?-1)+?\V?)(\.\R\R|\R?$) (also using the isg modifiers)
When using my regex, you need to replace the
`) on with
). It is an artifact from AHK that is required when doing
continuation sections.
Code: Select all
(?(DEFINE) # This block defines named subpatterns
(?<eol>\r?+\n|$) # End of line
(?<bline>^\h*+(?&eol)) # Blank line
# A line may be preceeded by any number of blank lines
# and it can be followed by any number of blank lines.
(?<line>(?&bline)*+^.*+(?&eol)(?&bline)*+)
)
# Here is the actual regex
(?<before>
(?&line){0,%max_lines%} # lines before the word
.*? # characters before the word on the same line
)
(?<word>\b%word%\b) # the actual word
(?<after>
.*+(?&eol) # characters after the word on the same line
(?&line){0,%max_lines%} # lines after the word
)
and the options you use are
gxmi. That would give you this:
https://regex101.com/r/SL7Le9/1
FYI, the
`n option is so that the regex recognizes the end of line character. By default, the continuation section EOL marker is a
`n, but the regex default is
`r`n. Since it would never see a
`r`n in the continuation section, it will never know that the EOL has been reached. This isn't a problem except for line comments (which I use liberally). As a line comment ends at the EOL, and wouldn't find it, it would ignore the rest of the regex. Alternatively, I could have set the continuation section to use
`r`n, used
(?#...) comments, or had no comments at all.
I realized that our different methods were a bit off. I forgot to anchor to the beginning of the line causing unnecessary repeat searches (an anchor matches before or after something but doesn't actually consume anything). Yours had some other issues, though it kinda worked. There is a problem I observed was when the word was on the second paragraph. See:
https://regex101.com/r/s4Fodr/1
I was actually curious about
(?-1)*?. This actually doesn't do anything that I can decern. Basically, it says to match the same text as was matched in
(.(?!\R\R)). Was that what you were intending? Because that match is unlikely, and you have it so that it matches 0 or more of them, it chooses 0. In other words, it doesn't do anything.
Another FYI, when you have a test that you want to share, click on the Save and Share link at the top left corner of the regex101 page and then click on the copy button and paste it here. That way we can see what you are referring to more easily.
So, you saying that you want to stay within the paragraph? If so, then try this regex:
https://regex101.com/r/QSyBn7/1
Code: Select all
(?(DEFINE) # This block defines named subpatterns
# Anchors to the the beginning of the string or is preceeded
# with an CR or LF character.
(?<BOL>(?<=^|[\r\n]))
# Either has an \R or is at the end of the string.
(?<EOL>\R|$)
# A blank line must anchor to a BOL and can contain 0 or
# more horizontal whitespace characters and ends with an EOL.
(?<bline>(?&BOL)\h*+(?&EOL))
# A line must anchor to a BOL, is not a blank line, contains
# 1 or more non CR or LF characters and ends with and EOL.
#
# If your paragraph markers will never contain 1 or more
# horizontal whitespace, then you don't actually need the
# (?!(?&bline)). It would then read:
#
# A line must anchor to a BOL and must contain one or more
# non CR or LF characters and ends with an EOL.
(?<line>(?&BOL)(?!(?&bline))[^\r\n]++(?&EOL))
)
# Here is the actual regex
(?<before>
(?&line){0,4} # lines before the word
(?&BOL)[^\r\n]*? # characters before the word on the same line
)
(?<word>\bthrough\b) # the actual word
(?<after>
[^\r\n]*+(?&EOL) # characters after the word on the same line
(?&line){0,4} # lines after the word
)
For complex regexes, I would recommend trying to think of what you are trying to do semantically and use subexpressions to mirror those semantics as it makes the regex easier to read and reason about.