how to prevent Look-ahead and look-behind assertions have effect on the entire string Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

how to prevent Look-ahead and look-behind assertions have effect on the entire string

21 Jun 2022, 10:57

topic: how to prevent Look-ahead and look-behind assertions have effect on the entire string;
or in other words
how to get ". *? " work in look-behind assertions so that regexmatch looks for a certain word as next as possible to another certain word of right to left;
or in other words
Is it possible to find a word before a specific word or letter?

hello, thanks in advance for any help you can give me and sorry for my english, I hope you understand me..

There is an issue that I thought would be easy to solve but I have been reading about regexmatch for 3 days, but I have not been able to find the solution
the situation is this: I have a string for example:

Code: Select all

xxxxyellowyyyyblueñññññred
and with ahk I want to find the words yellow, blue and red that are always separated by an indeterminate number of unknown characters, until I finally found a half solution:

Code: Select all

MsgBox % RegExMatch("xxxxyellowyyyyblueñññññred", "(yellow.*blue.*red)", SsubPat)
MsgBox, %SsubPat% ; show yellowyyyyblueñññññred
I was happy until I found the following string where yellow and red are repeated:

Code: Select all

MsgBox % RegExMatch("xxxxyellowxxyellowyyyyblueñññññredxxxredxxx", "(yellow.*blue.*red)", SsubPat)
MsgBox, %SsubPat% ; show yellowxxyellowyyyyblueñññññredxxxred
the previous result does not achieve what I want, which is a sequence of the 3 words (for this example it is: xxxyellowxxxbluexxxxredxxx) without repeating yellow or red, which are separated by other characters.
I was finally able to get regexmatch to discard the last red using ? after .*:

Code: Select all

MsgBox % RegExMatch("xxxxyellowxxyellowyyyyblueñññññredxxxredxxx", "(yellow.*?blue.*?red)", SsubPat)
MsgBox, %SsubPat% ; show yellowxxyellowyyyyblueñññññred.
But I couldn't get rid of the first yellow. I tried with Look-ahead and look-behind assertions which was a success: (https://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm#word)

Code: Select all

MsgBox % RegExMatch("xxxxyellowxxyellowyyyyblueñññredxxxred", "(yellow(?!.*yellow).*?blue.*?red)", SsubPat)

MsgBox, %SsubPat% ; show yellowyyyyblueñññred
but sadly, if after the sequence yellow xxx blue xxx red somewhere in the string or text there is the word yellow again, it doesn't matter if it's a thousand lines down, regexmatch returns 0 results:

Code: Select all

MsgBox % RegExMatch("xxxxyellowxxyellowyyyyblueñññredxxxred...long text...yellow", "(yellow(?!.*yellow).*?blue.*?red)", SsubPat)
MsgBox, %SsubPat% ; show 0
So the specific question is:
there is a way that Look-ahead and look-behind assertions has no effect beyond the blue word ((yellow(?!.*yellow).*?blue), anyone would swear that this code means: search for the word yellow followed by xx characters followed by blue, but between yellow and blue the word yellow does not exist again. but no working.

In other words:
why the use of .*? no working in Look-ahead and look-behind assertions for it to stop at the nearest yellow word

Code: Select all

MsgBox % RegExMatch("xxxxyellowxxyellowyyyyblueñññredxxxred", "(?<=.*?yellow).*?blue.*?(?=.*red)", %SsubPat%)
or In other words:
how to find the closest match (yellow) to the word blue from right to left? why doesn't .*? work from right to left.

please helpme...

I know I could use other code like StringSplit, but I was wondering if there is a really simple and clean way to do it with regexmatch with just one line of code.
maybe i need to just put a dot somewhere

[Mod edit: [code][/code] tags added to break up the wall of text a little.]
sofista
Posts: 654
Joined: 24 Feb 2020, 13:59
Location: Buenos Aires

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

21 Jun 2022, 11:26

Guess this is what you want, try

Code: Select all

RegExMatch("xxxxyellowxxyellowyyyyblueñññññredxxxredxxx", "(yellow).*\K(yellow.*?blue.*?red)", m)
MsgBox, % m    ; output -> yellowyyyyblueñññññred
Descolada
Posts: 1172
Joined: 23 Dec 2021, 02:30

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

21 Jun 2022, 11:32

Code: Select all

MsgBox % "Match at: " RegExMatch("xxxxyellowxxyellowyyyyblueñññredxxxred", "yellow(?:.(?!yellow))*?blue.*?red", SsubPat) "`nMatch: " SsubPat
why doesn't .*? work from right to left.
I'm not a RegEx expert, but as far as I know it's because it would require implementing the RegEx engine also backwards, which is a lot of work for little reward. Usually the task is achievable with lookaheads, or you can reverse your string and then do a lookahead (which would, in essence, result in a lookbehind).
User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

21 Jun 2022, 12:30

Code: Select all

H := "_yellow_blue_yellow_blue_blue_red_Yellow_Blue_Red_red_yellow_blue_red"
RegExMatch(H, "i)yellow(([^ybr]|y(?!ellow)|b(?!lue)|r(?!ed))?)blue(?1)red", m)
MsgBox % m					; Yellow_Blue_Red


note : it will not catch the proper sequence here because of the word "redraw" is confused for "red"
you would want to use "\b" boundary operator depending on your actual sentence.

Code: Select all

H := "_yellow_blue_yellow_blue_blue_red_Yellow_Blue_redraw_Red_yellow_blue_red"
RegExMatch(H, "i)yellow(([^ybr]|y(?!ellow)|b(?!lue)|r(?!ed))?)blue(?1)red", m)
MsgBox % m					; Yellow_Blue_red
teadrinker
Posts: 4368
Joined: 29 Mar 2015, 09:41
Contact:

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

21 Jun 2022, 12:48

Another option:

Code: Select all

RegExMatch("xxxxyellowxxyellowyyyyblueñññññredxxxredxxx", ".*\Kyellow.*?blue.*?red", SsubPat)
MsgBox, %SsubPat%
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

22 Jul 2022, 17:19

sofista wrote:
21 Jun 2022, 11:26
Guess this is what you want, try

Code: Select all

RegExMatch("xxxxyellowxxyellowyyyyblueñññññredxxxredxxx", "(yellow).*\K(yellow.*?blue.*?red)", m)
MsgBox, % m    ; output -> yellowyyyyblueñññññred
Thank you very much beautiful autohotkey community.
thanks for your answer but this code would not work in the following case:

Code: Select all

RegExMatch("xxxyellowyyyyblueñññññredxxxredxxx", "(yellow).*\K(yellow.*?blue.*?red)", m)
MsgBox, % m    ; output -> nothing
and in the following case it would not match yellow111blue111red111:

Code: Select all

RegExMatch("111yellow111blue111red111xxxyellowyyyyblueñññññredxxxredxxx", "(yellow).*\K(yellow.*?blue.*?red)", m)
MsgBox, % m    ; output -> nothing
I generally use the following code to be able to see all the matches found by regexmatch:

Code: Select all

H := "111yellow111blue111red111xxxyellowyyyyblueñññññredxxxredxxx"
K = i)(?:yellow).*\K(yellow.*?blue.*?red)
RegExMatch(H, K, SstubPat)
MsgBox, %SstubPat%
p := 1
array := []
while p:= RegExMatch(H, K, StubPat, p+StrLen(StubPat))
{
Array[A_Index] := StubPat1 ;esto sólo recupera el subpatron
msgbox % "Element number " . A_Index . " is " . Array[A_Index]
Count := Array.Count()
}
msgbox, % "total match =" Count
return
any other ideas please?
User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

22 Jul 2022, 17:44

made a small adjustment

Code: Select all

H := "
(
xxxyellowyyyyblueñññññredxxxredxxx
111yellow111blue111red111xxxyellowyyyyblueñññññredxxxredxxx
_yellow_blue_yellow_blue_blue_red_Yellow_Blue_Red_red_yellow_blue_red
)"

while pos := RegExMatch(H, "i)yellow(([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?1)red", m, A_Index=1?1:pos+StrLen(m))
	res .= m "`n"
MsgBox % res
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

22 Jul 2022, 17:49

AlphaBravo wrote:
21 Jun 2022, 12:30

Code: Select all

H := "_yellow_blue_yellow_blue_blue_red_Yellow_Blue_Red_red_yellow_blue_red"
RegExMatch(H, "i)yellow(([^ybr]|y(?!ellow)|b(?!lue)|r(?!ed))?)blue(?1)red", m)
MsgBox % m					; Yellow_Blue_Red


note : it will not catch the proper sequence here because of the word "redraw" is confused for "red"
you would want to use "\b" boundary operator depending on your actual sentence.

Code: Select all

H := "_yellow_blue_yellow_blue_blue_red_Yellow_Blue_redraw_Red_yellow_blue_red"
RegExMatch(H, "i)yellow(([^ybr]|y(?!ellow)|b(?!lue)|r(?!ed))?)blue(?1)red", m)
MsgBox % m					; Yellow_Blue_red
Thank you very much for your answer, this is the one that comes closest to what I want but Regrettably does not work if there is more than one characters between yellow blue red as in the following example:

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow(([^ybr]|y(?!ellow)|b(?!lue)|r(?!ed))?)blue(?1)red", m)
MsgBox % m		; nothing
return
I tried to modify your code by adding .*? before blue and red, but I return to the same initial problem, it repeats 2 times yellow:

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow(([^ybr]|y(?!ellow)|b(?!lue)|r(?!ed))?).*?blue.*?red", m)
MsgBox % m		; yellow111yellow222blue222red
return
Do you know how I can solve this problem?
since there is always an indefinite number of characters between yellow blue red.

actually this is important for scraping the source code of a page where you may need to find only one line containing 3 keywords, for example href="keyword1xxxkeyword2xxxxxxkeyword3xxxx/div".

and i know i can do it with other methods or by searching and re-searching on the same line but i really want to do it with regexmatch because it allows me to search for several words simultaneously even if these words are separated from each other by other characters
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

22 Jul 2022, 18:28

in case someone didn't understand me
I think the clearest and most summarized way to express what I want to achieve is the following:
find the nearest word(yellow) to another word(blue) from right to left
example:
yellow3xxxyellow2xxxyellow1xxxbluexxxred1xxxred2xxxred3xxxyellow....

finding the nearest word (red) to another word (blue) from left to right is so easy using ".*?" ..... I'm surprised it's so hard from right to left:

Code: Select all

RegExMatch("yellow3xxxyellow2xxxyellow1xxxbluexxxred1xxxred2xxxred3xxxyellow...", "blue.*?red", m)
MsgBox, % m ; bluexxxred1 but I want you to show yellow1xxxbluexxxred1
User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

22 Jul 2022, 19:53

did you try my modified needle from my last post!!

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow(([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?1)red", m)
MsgBox % m
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

24 Jul 2022, 16:10

AlphaBravo wrote:
22 Jul 2022, 19:53
did you try my modified needle from my last post!!

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow(([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?1)red", m)
MsgBox % m
hello, thank you,
the code fulfills the purpose, impressive, I have never been so close to achieving it, the only problem now is that it doesn't let you capture subpatterns (https://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm#subpat).
I would like to capture the subpattern from after yellow and just before red:
Patterns: yellow111blue111red and yellow444blue444red
subpatterns that I want to capture: 111blue111 and 444blue444
i try it but not working:

Code: Select all

H = yellow333yellow222yellow111blue111red111red222red333yellowkkkyellow777yellow555yellow444blue444red444red555red777yellow
K = yellow((([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?1))red ;adding a parenthesis before and after the subpattern that you want to capture as normally this is done (after yellow and before red) but no working
RegExMatch(H, K, t)
MsgBox % t

p := 1
array := []
while p:= RegExMatch(H, k, m, p+StrLen(m))
{
Array[A_Index] := m2 ;esto sólo recupera el subpatron
msgbox % "Element number " . A_Index . " is " . Array[A_Index]
Count := Array.Count()
}
msgbox, % "total match =" Count
This is the most I have achieved, just a little bit of the subpattern

Code: Select all

H = yellow333yellow222yellow111blue111red111red222red333yellowkkkyellow777yellow555yellow444blue444red444red555red777yellow
K = yellow(([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)(blue(?1))red
RegExMatch(H, K, t)
MsgBox % t

p := 1
array := []
while p:= RegExMatch(H, k, m, p+StrLen(m))
{
Array[A_Index] := m3 ;esto sólo recupera el subpatron
msgbox % "Element number " . A_Index . " is " . Array[A_Index] ; show blue111 and blue444 only
Count := Array.Count()
}
msgbox, % "total match =" Count

return
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

24 Jul 2022, 16:14

Does anyone have any idea how to capture subpatterns with the code of AlphaBravo:

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow(([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?1)red", m)
MsgBox % m
for example the subpattern between yellow and red
User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

24 Jul 2022, 17:29

rj8810 wrote:
24 Jul 2022, 16:14
Does anyone have any idea how to capture subpatterns with the code of AlphaBravo:

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow(([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?1)red", m)
MsgBox % m
for example the subpattern between yellow and red
Keep it simple and run a another regex on the result like so

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow(([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?1)red", m)
RegExMatch(m, "i)yellow(.*?)red", n)
MsgBox % n1
or if you really have to use one liner:

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow((([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?2))red", m)
MsgBox % m1
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

24 Jul 2022, 21:46

AlphaBravo wrote:
24 Jul 2022, 17:29

or if you really have to use one liner:

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow((([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?2))red", m)
MsgBox % m1
thank you very much alfabravo, now your code works perfectly. :superhappy: :thumbup: :clap: :bravo:

Code: Select all

H = yellow333yellow222yellow111blue111red111red222red333yellowkkkyellow777yellow555yellow444blue444red444red555red777yellow
K = i)yellow((([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?2))red

p := 1
array := []
array2 := []
while p:= RegExMatch(H, k, m, p+StrLen(m))
{
Array[A_Index] := m ;esto sólo recupera el patron	
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
msgbox, % "total match =" Count

return

surprising, after 3 days testing and investigating a (?2) solved everything, although I don't remember that (?2) is documented in autohotkey; can't fully understand your code but it works.
Since with Look-ahead and look-behind assertions I could never achieve what I wanted, I think it would be a good idea to rename this topic, something like: how to find a specific word that is closest to another word from right to left alternative to Look-ahead and look-behind assertions;
since I was able to get the following code to worked:

Code: Select all

H = yellow333yellow222yellow111blue111red111red222red333
K = yellow(?!.*yellow).*?blue.*?red
RegExMatch(H, K, m)
MsgBox % m
but never this:

Code: Select all

H = yellow333yellow222yellow111blue111red111red222red333forthisyellownoworkingthecode
K = yellow(?!.*yellow).*?blue.*?red
RegExMatch(H, K, m)
MsgBox % m
AlphaBravo you already achieved the purpose, but just out of curiosity, it was possible to make that negative look-ahead assertion(?!.*yellow) only take effect up to the word red, something like this:

Code: Select all

H = yellow333yellow222yellow111blue111red111red222red333forthisyellownoworkingthecode
K = yellow(?!.*yellow untilred).*?blue.*?red
RegExMatch(H, K, m)
MsgBox % m
Did I miss something, or was that definitely impossible to achieve with negative look-ahead assertion?
I repeat, just out of curiosity, since I am very happy with your code.
your code is great, for literal words, but right now I don't know if I will be able to use your code with variables, numbers and special characters, since the words yellow, blue and red were for general purpose, but in my real case it is:
var:="anything"
keyword 1: href="
keyword 2: %var%-
keyword 3: id
keyword 4:">div
I will look for a way to implement your code with these 4 keywords, if I don't succeed, I'm afraid I'll have to ask you for help again
Last edited by rj8810 on 24 Jul 2022, 22:54, edited 3 times in total.
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

24 Jul 2022, 21:55

just out of curiosity does anyone know how to put something like an anchor to negative look-ahead assertion:

Code: Select all

H = yellow222yellow111blue111red111red222red333forthisyellownoworkingthecode ; negative look-ahead only until the word red should have effect
K = yellow(?!.*yellow anchorred).*?blue.*?red ; 
RegExMatch(H, K, m)
MsgBox % m
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

25 Jul 2022, 00:14

AlphaBravo wrote:
24 Jul 2022, 17:29
rj8810 wrote:
24 Jul 2022, 16:14
or if you really have to use one liner:

Code: Select all

H = 111yellow111yellow222blue222red222
RegExMatch(H, "i)yellow((([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?2))red", m)
MsgBox % m1
Alphabravo unfortunately it can't implement your code properly, this was as far as I got:

Code: Select all

w::
keyword:="anytng"
H = href=222href=111anytng111keywordanytngoroney1id-123>id222id457href=xxxhref=555anytng444keywordanytngoroney1id-321>xxx
K = i)href=((([^hi]+)?)%anytng%(?2))id-(\d)*>

p := 1
array := []
array2 := []
while p:= RegExMatch(H, k, m, p+StrLen(m))
{
Array[A_Index] := m ;esto sólo recupera el subpatron	
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
msgbox, % "total match =" Count

return
the biggest problem i got is that as you can see, in the string, between "href= and id " there can't be any word containing the h or i character, like anything or hi or honey:

Code: Select all

keyword:="anything"
H = href=222href=111anytng111keywordanythingohiorhoney1id-123>id222href=xxxhref=555anytng444keywordanythingorhoney1id-321>xxx
K = i)href=((([^hi]+)?)%anytng%(?2))id-(\d)*>

p := 1
array := []
array2 := []
while p:= RegExMatch(H, k, m, p+StrLen(m))
{
Array[A_Index] := m ;esto sólo recupera el patron	
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
msgbox, % "total match =" Count

return
the only solution I see is to find a way that in ([^....] you can write whole words and not a list of individual characters, such as:

Code: Select all

keyword:="anything"
H = href=222href=111anytng111keywordanythingohiorhoney1id-123>id222href=xxxhref=555anytng444keywordanythingorhoney1id-321>xxx
K = i)href=((([^(href=)]+)?)%anytng%(?2))id-(\d)*>

p := 1
array := []
array2 := []
while p:= RegExMatch(H, k, m, p+StrLen(m))
{
Array[A_Index] := m ;esto sólo recupera el patron	
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
msgbox, % "total match =" Count

return
please help I feel lost again
Descolada
Posts: 1172
Joined: 23 Dec 2021, 02:30

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

25 Jul 2022, 00:58

@rj8810, could you post a concise description of what you are trying to match and what the rules of matching need to be? Preferably post the real haystack you are using, not the one with reds and yellows :)
User avatar
boiler
Posts: 17206
Joined: 21 Dec 2014, 02:44

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

25 Jul 2022, 04:32

rj8810 wrote:
24 Jul 2022, 21:46
I don't remember that (?2) is documented in autohotkey
The AHK documentation on RegEx is not meant to completely cover its features, as is stated at the bottom the RegEx Quick Reference page:
RegEx Quick Reference wrote:Final note: Although this page touches upon most of the commonly-used RegEx features, there are quite a few other features you may want to explore such as conditional subpatterns. The complete PCRE manual is at www.pcre.org/pcre.txt

If there is an element of a RegEx pattern that you don’t understand, a good way of discovering its meaning is to paste the pattern in the “REGULAR EXPRESSION” field at regex101.com and look at the explanation to the right or hover over the token in the pattern itself. There it will explain that “(?2) matches the expression defined in the 2nd capture.”
User avatar
AlphaBravo
Posts: 586
Joined: 29 Sep 2013, 22:59

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

25 Jul 2022, 10:07

Descolada wrote:
25 Jul 2022, 00:58
@rj8810, could you post a concise description of what you are trying to match and what the rules of matching need to be? Preferably post the real haystack you are using, not the one with reds and yellows :)
+1
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

25 Jul 2022, 14:20

Descolada wrote:
25 Jul 2022, 00:58
@rj8810, could you post a concise description of what you are trying to match and what the rules of matching need to be? Preferably post the real haystack you are using, not the one with reds and yellows :)
GRacias
1.- my wish is to find a pattern composed of 3 words:
keyword = href="
keyword2 = %variable%
keyword3 = /div(id:123)"> ; note: 123 but can be it is any number

2.- each keyword is separate by an indeterminate number of unknown characters, The pattern I want to match is something like this:
href="xxxanythingvariablekeyword2valuexxx/div(id:123)">

3.- the document has multiple exact matching patterns and other similar ones which should be discarded as:
href="222xxxhref="111xxxanythingvariablekeyword2valuexxx/div(id:123)">yyyy/div(id:222)"> ; note that ...href="... appears 2 times here, the same as .../div(id:123)">... . I only need to start from the word href=" closest to the value of variable keyword2 from right to left and the /div(id:123)"> closest to the value of variable keyword2 from left to right

I know it could be done many ways with several lines of code, but I would really like a single line of code with regexmatch and possibly look-ahead and look-behind assertions,
this is what i can do:

Code: Select all

q::
document =  href="222xxxhref="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">
keyword2 = anything
search = i)href="(.*?%keyword2%-.*?\/div\(id:\d.*?)"> ;la i) es para que no distinga entre mayusculas y minusculas 

p := 1
array := []
array2 := []
while p:= RegExMatch(document, search, pattern, p+StrLen(pattern)) 	
{
Array[A_Index] := pattern ;esto sólo recupera el patron	
Array2[A_Index] := pattern1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return
but the problem that I have never been able to solve is that it does not match from the closest href=" from right to left of the variable value %keyword2%. therefore href=" appears 2 times.

So I investigated and found a solution with Look-ahead and look-behind assertions:(?!.*href="), which establishes a condition to match: that there is no other href=" to the right of href="

Code: Select all

document =  href="222xxxhref="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">
keyword2 = anything
search = i)href="(?!.*href=")(.*?%keyword2%-.*?\/div\(id:\d.*?)"> ;la i) es para que no distinga entre mayusculas y minusculas 

p := 1
array := []
array2 := []
while p:= RegExMatch(document, search, pattern, p+StrLen(pattern)) 
{
Array[A_Index] := pattern ;esto sólo recupera el patron	
Array2[A_Index] := pattern1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return
The problem with this code is that when there are multiple matching patterns in the document, it doesn't work.
apparently (?!.*href=") does not stop at the end(/div(id:123)">) of the pattern to be matched. apparently it looks for an href=" from left to right until the end of the document, and if it finds it, the match does not occur

Code: Select all

ñ::
document =  href="222xxxhref="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">thepatternrepeatshref="222xxxhref="111xxxanything-xxx/div(id:333)">yyyy/div(id:444)">href="...longtext
keyword2 = anything
search = i)href="(?!.*href=")(.*?%keyword2%-.*?\/div\(id:\d.*?)">  ; la i) es para que no distinga entre mayusculas y minusculas 

p := 1
array := []
array2 := []
while p:= RegExMatch(document, search, pattern, p+StrLen(pattern)) 	; Aquí lo que se busca es esto: href="/item/domiciliario-iid-1115341856">        pero  Para escapar de comillas literales hay que anteponer otra " En regex, O almacenar la cadena literal en una variable ya que las variedades asumen todo como literal excepto los caracteres especiales de regex lo cual es una buena solucion
{
Array[A_Index] := pattern ;esto sólo recupera el patron	
Array2[A_Index] := pattern1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return
the strange thing is that the use of .*? and of positive look-ahead like abc(?=.*xyz) works perfectly to find the closest word to another word from left to right, but using look-behind assertions (?<=...) and (?<!.. .) do not fulfill the task.

Code: Select all

q::
MsgBox % RegExMatch("xxxxyellow222xxxyellow111yyyyblueñññred111xxxred222", "(yellow(?<!.*yellow).*?blue.*?red)", SsubPat)

MsgBox, %SsubPat% ; show yellowyyyyblueñññred
return
autohotkey documentation about it:
Greed: By default, *, ?, +, and {min,max} are greedy because they consume all characters up through the last possible one that still satisfies the entire pattern. To instead have them stop at the first possible character, follow them with a question mark. For example, the pattern <.+> (which lacks a question mark) means: "search for a <, followed by one or more of any character, followed by a >". To stop this pattern from matching the entire string <em>text</em>, append a question mark to the plus sign: <.+?>. This causes the match to stop at the first '>' and thus it matches only the first tag <em>.

Look-ahead and look-behind assertions: The groups (?=...), (?!...), (?<=...), and (?<!...) are called assertions because they demand a condition to be met but don't consume any characters. For example, abc(?=.*xyz) is a look-ahead assertion that requires the string xyz to exist somewhere to the right of the string abc (if it doesn't, the entire pattern is not considered a match). (?=...) is called a positive look-ahead because it requires that the specified pattern exist. Conversely, (?!...) is a negative look-ahead because it requires that the specified pattern not exist. Similarly, (?<=...) and (?<!...) are positive and negative look-behinds (respectively) because they look to the left of the current position rather than the right. Look-behinds are more limited than look-aheads because they do not support quantifiers of varying size such as *, ?, and +. The escape sequence \K is similar to a look-behind assertion because it causes any previously-matched characters to be omitted from the final matched string. For example, foo\Kbar matches "foobar" but reports that it has matched "bar".

and this is where the Alphabravo code appears,with which I can also capture subpatterns, it works perfectly, but unfortunately only for the following specific case:

Code: Select all

q::
H = yellow333yellow222yellow111blue111red111red222red333yellowkkkyellow777yellow555yellow444blue444red444red555red777yellow
K = i)yellow((([^ybr]+|y+(?!ellow)|b+(?!lue)|r+(?!ed))?)blue(?2))red

p := 1
array := []
array2 := []
while p:= RegExMatch(H, k, m, p+StrLen(m))
{
Array[A_Index] := m ;esto sólo recupera el patron	
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
msgbox, % "total match =" Count

return

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: GooGooPark and 150 guests