ready, post a real case of what I need, in advance thanks to all
how to prevent Look-ahead and look-behind assertions have effect on the entire string Topic is solved
Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string
-
- Posts: 4412
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string
Try this:
Code: Select all
document = href="222xxxhref="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">
keyword2 = anything
RegExMatch(document, "(href="")(.(?!(?1)|(?3)))*?.?(" . keyword2 . ")(?2)*?/div\(id:\d+\)"">", m)
MsgBox, % m
Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string
thank you very much, this code seems to work very well,teadrinker wrote: ↑25 Jul 2022, 18:55Try this:Code: Select all
document = href="222xxxhref="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)"> keyword2 = anything RegExMatch(document, "(href="")(.(?!(?1)|(?3)))*?.?(" . keyword2 . ")(?2)*?/div\(id:\d+\)"">", m) MsgBox, % m
Code: Select all
q::
document = href="222xxx href="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">thepatternrepeatshref="222xxxhref="333xxxanything-xxx/div(id:333)">yyyy/div(id:444)">...longtex...href="xxx
keyword2 = anything
p := 1
array := []
array2 := []
while p:=RegExMatch(document, "(href="")(.(?!(?1)|(?3)))*?.?(" . keyword2 . ")(?2)*?/div\(id:\d+\)"">", m, p+StrLen(m))
{
Array[A_Index] := m ;esto sólo recupera el patron
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return
try putting everything in the search= variable, since that way I don't have to escape many characters, and I put the famous capturing parentheses: (https://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm#subpat) , but I did not make it
Code: Select all
q::
document = href="222xxx href="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">thepatternrepeatshref="222xxxhref="333xxxanything-xxx/div(id:333)">yyyy/div(id:444)">...longtex...href="xxx
keyword2 = anything
search = (href="()(.(?!(?1)|(?3)))*?.?%keyword%(?2)*?/div\(id:\d+\))">
p := 1
array := []
array2 := []
while p:=RegExMatch(document, search, m, p+StrLen(m))
{
Array[A_Index] := m ;esto sólo recupera el patron
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return
Code: Select all
q::
document = href="222xxx href="111xxxanything-xxx/div(id:123)">yyyy/div(id:222)">thepatternrepeatshref="222xxxhref="333xxxanything-xxx/div(id:333)">yyyy/div(id:444)">...longtex...href="xxx
keyword2 = anything
p := 1
array := []
array2 := []
while p:=RegExMatch(document, "(href=""()(.(?!(?1)|(?3)))*?.?(" . keyword2 . ")(?2)*?/div\(id:\d+\))"">", m, p+StrLen(m)) ; Aquí lo que se busca es esto: href="/item/domiciliario-iid-1115341856"> pero Para escapar de comillas literales hay que anteponer otra " En regex, O almacenar la cadena literal en una variable ya que las variedades asumen todo como literal excepto los caracteres especiales de regex lo cual es una buena solucion
{
Array[A_Index] := m ;esto sólo recupera el patron
Array2[A_Index] := m1 ;esto sólo recupera el subpatron
msgbox % "pattern number " . A_Index . " is " . Array[A_Index]
msgbox % "subpattern number " . A_Index . " is " . Array2[A_Index]
Count := Array.Count()
}
MsgBox, % Count
return
any idea, it would be the last thing I ask to achieve the purpose of this post, thank you very much in advance
Re: how to prevent Look-ahead and look-behind assertions have effect on the entire string Topic is solved
@rj8810, it seems you are trying to match html, yet your example isn't valid html... With your example I guess something like this would work: href="((?:.(?!href="))*?anything-(?:.(?!">))*?\/div\(id:\d.*?)">, whereas with real HTML this would probably be better: href="((?:.(?!">))*?anything-(?:.(?!">))*?\/div\(id:\d.*?)">
Thank you very much, I tried your first code in many scenarios and it always works perfectly.Descolada wrote: ↑25 Jul 2022, 23:25@rj8810, it seems you are trying to match html, yet your example isn't valid html... With your example I guess something like this would work: href="((?:.(?!href="))*?anything-(?:.(?!">))*?\/div\(id:\d.*?)">, whereas with real HTML this would probably be better: href="((?:.(?!">))*?anything-(?:.(?!">))*?\/div\(id:\d.*?)">
the second doesn't work because it matches from the first "href" it finds in the html document to the last "">" it finds in the document, thus matching the entire document. the second code works, but strictly in html document without errors.it doesn't work in other scenarios, instead the first one is more universal, just what I was looking for, although I couldn't understand your code, because I think it's outside the autohotkey documentation, but well when I have time I'll investigate it, since I don't like it alone copy and paste, if not I like to understand why the code works:
Code: Select all
Q::
content = notvalifirstdhref="validfirsthref="/item/mykeyword-iid-1111111">second href="/item/mykeyword-iid-2222222">
keyword:="mykeyword"
search = href="((?:.(?!href="))*?%keyword%-(?:.(?!">))*?id-\d.*?)">
;search = href="((?:.(?!">))*?%keyword%-(?:.(?!">))*?id-\d.*?)">
p := 1
array := []
while p:= RegExMatch(content, search, StubPat, p+StrLen(StubPat))
{
Array[A_Index] := StubPat1 ;esto sólo recupera el subpatron
msgbox % "Element number " . A_Index . " is " . Array[A_Index]
Count := Array.Count()
}
msgbox, % "count =" Count
return
Also many thanks to everyone else for their help:
...
...
...
The only way I find to thank you is also to share, if this could be useful to you or you have better ideas, I invite you to see this post, with which I achieved my complete purpose: web scraping with regexmatch in the cloud with autohotkey.ahk
viewtopic.php?f=76&t=107135
full topic: "how to make autohotkey work on pc via rdp connections, or work on multiple, different and independent virtual desktops without scripts interfering with each other."
see you next time friends....
Who is online
Users browsing this forum: Bing [Bot], oktavimark, Rohwedder and 278 guests