how to prevent Look-ahead and look-behind assertions have effect on the entire string Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

25 Jul 2022, 14:23

AlphaBravo wrote:
25 Jul 2022, 10:07
Descolada wrote:
25 Jul 2022, 00:58
@rj8810, could you post a concise description of what you are trying to match and what the rules of matching need to be? Preferably post the real haystack you are using, not the one with reds and yellows :)
+1
ready, post a real case of what I need, in advance thanks to all
teadrinker
Posts: 4412
Joined: 29 Mar 2015, 09:41
Contact:

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

25 Jul 2022, 18:55

Try this:

Code: Select all

document =  href="222xxxhref="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">
keyword2 = anything

RegExMatch(document, "(href="")(.(?!(?1)|(?3)))*?.?(" . keyword2 . ")(?2)*?/div\(id:\d+\)"">", m)
MsgBox, % m
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string

25 Jul 2022, 20:52

teadrinker wrote:
25 Jul 2022, 18:55
Try this:

Code: Select all

document =  href="222xxxhref="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">
keyword2 = anything

RegExMatch(document, "(href="")(.(?!(?1)|(?3)))*?.?(" . keyword2 . ")(?2)*?/div\(id:\d+\)"">", m)
MsgBox, % m
thank you very much, this code seems to work very well, :bravo: :dance: :D

Code: Select all

q::
document =  href="222xxx href="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">thepatternrepeatshref="222xxxhref="333xxxanything-xxx/div(id:333)">yyyy/div(id:444)">...longtex...href="xxx
keyword2 = anything

p := 1
array := []
array2 := []
while p:=RegExMatch(document, "(href="")(.(?!(?1)|(?3)))*?.?(" . keyword2 . ")(?2)*?/div\(id:\d+\)"">", m, p+StrLen(m))
{
Array[A_Index] := m ;esto sólo recupera el patron	
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return
although I haven't been able to capture subpatterns just after href=" and just before ">.
try putting everything in the search= variable, since that way I don't have to escape many characters, and I put the famous capturing parentheses: (https://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm#subpat) , but I did not make it

Code: Select all

q::
document =  href="222xxx href="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">thepatternrepeatshref="222xxxhref="333xxxanything-xxx/div(id:333)">yyyy/div(id:444)">...longtex...href="xxx
keyword2 = anything
search = (href="()(.(?!(?1)|(?3)))*?.?%keyword%(?2)*?/div\(id:\d+\))">

p := 1
array := []
array2 := []
while p:=RegExMatch(document, search, m, p+StrLen(m))	
{
Array[A_Index] := m ;esto sólo recupera el patron	
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return
neither this way: (href=""(subpattern)"">

Code: Select all

q::
document =  href="222xxx href="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">thepatternrepeatshref="222xxxhref="333xxxanything-xxx/div(id:333)">yyyy/div(id:444)">...longtex...href="xxx
keyword2 = anything

p := 1
array := []
array2 := []
while p:=RegExMatch(document, "(href=""()(.(?!(?1)|(?3)))*?.?(" . keyword2 . ")(?2)*?/div\(id:\d+\))"">", m, p+StrLen(m))	; Aquí lo que se busca es esto: href="/item/domiciliario-iid-1115341856">        pero  Para escapar de comillas literales hay que anteponer otra " En regex, O almacenar la cadena literal en una variable ya que las variedades asumen todo como literal excepto los caracteres especiales de regex lo cual es una buena solucion
{
Array[A_Index] := m ;esto sólo recupera el patron	
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return


any idea, it would be the last thing I ask to achieve the purpose of this post, thank you very much in advance
Descolada
Posts: 1202
Joined: 23 Dec 2021, 02:30

Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string  Topic is solved

25 Jul 2022, 23:25

@rj8810, it seems you are trying to match html, yet your example isn't valid html... With your example I guess something like this would work: href="((?:.(?!href="))*?anything-(?:.(?!">))*?\/div\(id:\d.*?)">, whereas with real HTML this would probably be better: href="((?:.(?!">))*?anything-(?:.(?!">))*?\/div\(id:\d.*?)">
rj8810
Posts: 31
Joined: 16 Jul 2018, 22:34

06 Aug 2022, 23:54

Descolada wrote:
25 Jul 2022, 23:25
@rj8810, it seems you are trying to match html, yet your example isn't valid html... With your example I guess something like this would work: href="((?:.(?!href="))*?anything-(?:.(?!">))*?\/div\(id:\d.*?)">, whereas with real HTML this would probably be better: href="((?:.(?!">))*?anything-(?:.(?!">))*?\/div\(id:\d.*?)">
Thank you very much, I tried your first code in many scenarios and it always works perfectly.

the second doesn't work because it matches from the first "href" it finds in the html document to the last "">" it finds in the document, thus matching the entire document. the second code works, but strictly in html document without errors.it doesn't work in other scenarios, instead the first one is more universal, just what I was looking for, although I couldn't understand your code, because I think it's outside the autohotkey documentation, but well when I have time I'll investigate it, since I don't like it alone copy and paste, if not I like to understand why the code works:

Code: Select all

Q::
content = notvalifirstdhref="validfirsthref="/item/mykeyword-iid-1111111">second href="/item/mykeyword-iid-2222222">
keyword:="mykeyword"
search = href="((?:.(?!href="))*?%keyword%-(?:.(?!">))*?id-\d.*?)">
;search = href="((?:.(?!">))*?%keyword%-(?:.(?!">))*?id-\d.*?)">
p := 1
array := []
while p:= RegExMatch(content, search, StubPat, p+StrLen(StubPat))
{
Array[A_Index] := StubPat1 ;esto sólo recupera el subpatron
msgbox % "Element number " . A_Index . " is " . Array[A_Index]
Count := Array.Count()
}
msgbox, % "count =" Count
return
thanks for sharing your knowledge :bravo: :clap: :dance: :superhappy: I spent many days reading.
Also many thanks to everyone else for their help:
boiler wrote:
25 Jul 2022, 04:32
...
AlphaBravo wrote:
25 Jul 2022, 10:07
...
teadrinker wrote:
25 Jul 2022, 18:55
...

The only way I find to thank you is also to share, if this could be useful to you or you have better ideas, I invite you to see this post, with which I achieved my complete purpose: web scraping with regexmatch in the cloud with autohotkey.ahk
viewtopic.php?f=76&t=107135
full topic: "how to make autohotkey work on pc via rdp connections, or work on multiple, different and independent virtual desktops without scripts interfering with each other."

see you next time friends....

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Bing [Bot], oktavimark, Rohwedder and 278 guests