AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

grep() - global regular expression match

 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions
View previous topic :: View next topic  
Author Message
Titan



Joined: 11 Aug 2004
Posts: 5007
Location: imaginationland

PostPosted: Mon Jan 29, 2007 2:13 am    Post subject: grep() - global regular expression match Reply with quote

Details in the script.

Download v2 or 1.3 final
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")


Last edited by Titan on Fri Sep 28, 2007 9:49 pm; edited 5 times in total
Back to top
View user's profile Send private message Visit poster's website
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10450

PostPosted: Mon Jan 29, 2007 1:24 pm    Post subject: Reply with quote

I've added a link to this great resource from the "related" section of RegExMatch().

Your code is so short I was tempted to put it into the examples section. But I don't understand it well enough to be comfortable with that -- in part due to not fully understanding Grep. So I just linked to this topic, which provides your full examples and makes it easier for you to update it.

Thanks.
Back to top
View user's profile Send private message Send e-mail
Titan



Joined: 11 Aug 2004
Posts: 5007
Location: imaginationland

PostPosted: Mon Jan 29, 2007 2:16 pm    Post subject: Reply with quote

I designed them to be as fast as possible which is why some expressions look a bit cryptic. grep which I just updated, returns matches and positions as comma seperated values which is ideal for use in functions since StringSplit facilitates for global/local array creation. RegExMatchAll is better for general use as it supports subpatterns.
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
Titan



Joined: 11 Aug 2004
Posts: 5007
Location: imaginationland

PostPosted: Mon Feb 12, 2007 1:11 pm    Post subject: Reply with quote

In version 1.2 I fixed the infinite recursion bug that occurred when UnquotedOutputVar had the same address as Haystack. Performance increased by 20% somehow. Version 1.3 brings a few new options to grep().
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
Laszlo



Joined: 14 Feb 2005
Posts: 3871
Location: Pittsburgh

PostPosted: Sat May 26, 2007 6:17 pm    Post subject: Reply with quote

RegExReplace often provides a simpler alternative (putting all matches in a variable)
Code:
t = C:\Windows\system32\systeminfo.exe`n
s = `n                ; separator character between matches, like "|" or ","
p =  (?<=([\\\.]))\w+ ; pattern to search for

t := RegExReplace(t, "(.*?)((" . p . ")|$)", "$2" . s)
StringTrimRight t, t, SubStr(t,-2,1) = s ? 3 : 2

MsgBox [%t%]
If you want multi char separators, only the StringTrimRight command needs adaptation.
Back to top
View user's profile Send private message Visit poster's website
Titan



Joined: 11 Aug 2004
Posts: 5007
Location: imaginationland

PostPosted: Sat May 26, 2007 6:49 pm    Post subject: Reply with quote

I use that method for a few things but it has its limitations i.e. it can't get subpatterns or positions, fails with options and complex expressions, can't recurse or backtrack, breaks backreferences and possibly some anchors and atomic groups, matches surrounding whitespace/delimiter chars etc. The lazy wildcard is said to be very slow and is generally discouraged.
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
Joy2DWorld



Joined: 04 Dec 2006
Posts: 386
Location: Galil, Israel

PostPosted: Thu Jun 28, 2007 10:46 pm    Post subject: Reply with quote

To do a MATCHALL, there is an (?) easier / FASTER way...

example:

Code:
start := (matched := 0) + 1
loop
   if regexmatch("12345","(?<whole>.*?(?<ANSWER" . a_index . ">\d))",The_,start) and ++matched
      start += strlen(The_whole)
   else
      break

msgbox % matched

loop % matched
   msgbox % The_Answer%A_index%

exitapp



if this is helpful(?)

ie. regexmatch(HAYSTACK, "(?<whole>" ... rest of match block ... ")",Array_Var_Name, PlaceHolder)

the match block is normal except... in place of (?<named>) use "(?<named" . a_index . ">" for each named match.

or:

Code:
if regexmatch(Haystack,"sx)

(?<whole>.*?  #  .*? ie.  if there can be optional text between matches
(?<ANSWER" . a_index . ">  # name of section + index counter
\d)  # for each section you want to capture
)"  ; Closing paren for entire match section
,Array_,start)

_________________
Joyce Jamce
Back to top
View user's profile Send private message
Titan



Joined: 11 Aug 2004
Posts: 5007
Location: imaginationland

PostPosted: Thu Jun 28, 2007 10:55 pm    Post subject: Reply with quote

That's how grep() works, but instead of using a Loop you get the convenience of a single function and a few extra options.
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
Joy2DWorld



Joined: 04 Dec 2006
Posts: 386
Location: Galil, Israel

PostPosted: Fri Jun 29, 2007 1:27 am    Post subject: Reply with quote

if we de-obscifate your function:

Code:



grep(Haystack, Needle, ByRef outputVar, whichmatchB = 0, positionS = 1, charhopefullyNOTinanymatcheD = ",", matchfromlastZ = true) {
   Loop
      If positionS := RegExMatch(Haystack, Needle, X, PositionS) {
         outPut .= positionS . charhopefullyNotinanymacheD
         positionS += matchfromlastZ  ? StrLen(X) : 1
         Y .= (whichmatchB ? X%whichmatchB% : X) . charhopefullyNotinanymacheD
      } Else {
         outputVar := SubStr(Y, 1, -1)
         Return SubStr(outPut, 1, -1)
      }

}


we see your very cool use of the return value (foundposition), but some huge LIMITATIONS with the function:

1) only return one (1) match, not multiple arrayed matches...... [with the simple loop suggested above, can make as many named matches/submatches as desired... ], eg:
Code:
start := (matched := 0) + 1
loop
   if regexmatch("12345","x)(?<whole>.*?(?<ANSWER" . a_index . ">\d)(?<AnswerB" . a_index . ">\d))",The_,start) and ++matched
      start += strlen(The_whole)
   else
      break


2) need a 'pray it isn't in the match result' character to seperate the matches!!!

3) don't return an array....

4) are much slower...

and notably,

as pointed out above in thread, to create a conjoined string with matches, simply use regexreplace to kill everything between matches and insert whatever 'between' string you desire......
_________________
Joyce Jamce
Back to top
View user's profile Send private message
Titan



Joined: 11 Aug 2004
Posts: 5007
Location: imaginationland

PostPosted: Fri Jun 29, 2007 1:50 am    Post subject: Reply with quote

Joy2DWorld wrote:
only return one (1) match, not multiple arrayed matches
Which could be any one. I'll try see if this limitation can be lifted with a new paradigm.

Joy2DWorld wrote:
need a 'pray it isn't in the match result' character to seperate the matches
That's true, I usually escape my commas prior to calling grep. Until real arrays/objects are supported AutoHotkey will always have a problem here.

Joy2DWorld wrote:
don't return an array
In most cases this is better. You can transform the string with Sort, parsing loops and not worry about variable scope within functions.

Joy2DWorld wrote:
much slower
I did a few tests, and I found that it was only ever slightly slower (1-2%). Like I said this function is for convenience, looping is not a new concept. If performance is critical you can write a Dll in ASM and call your exported functions.

Joy2DWorld wrote:
use regexreplace to kill everything between matches
In my follow up I listed a few reasons why it's not a practical solution - breaking backreferences is a major worry because I use them a lot.
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
Joy2DWorld



Joined: 04 Dec 2006
Posts: 386
Location: Galil, Israel

PostPosted: Wed Aug 15, 2007 9:44 am    Post subject: Reply with quote

ok...


so easy, even pre-beginner can use !!!



STARTER:

use it like this,


RegexMall( HAYSTACK, NEEDLE)

get back

for *each* (?<named_match>xxxxxx)

ie.. "(?<first>\d+)\D+(?<second>\d*?)\s*(?<third>\w+)" is legal.


matched result contained in named matches, eg:

$named_match1 - 9999

function returns # of matches




also:

$ has full matches

$RegEx1-999 has full match for that match #

$0 also has # of matches...




options:


RegexMall(haystack,Needle, "VARIABLE TO USE", "Spacer to use", Position to start in regexreplace)


spacer applies to TOTAL MATCH return string... eg. "$" (if no variable alternative designated)


RegexMall(hay,needle,"THIS_") ->

returns

THIS_namedmatch1
THIS_namedmatch2

etc..

ie. in place of default "$"

Code:
/*
; example
msgbox % RegexMall("test and more ! and test and more and more!! ","(?<happy>test).*?(?<more>more)","Yes_") "- " Yes_happy1 " " Yes_more2 "`n" Yes_ "-" Yes_0
*/


RegExMall(haystack,needle,var = "$",spacer = "", position = 1) {
   global
   local tmps, count
   ; local save$ := %var%
   loop
      if !(position := regexmatch(haystack
                  , regexreplace(needle,"(?<!\\)\(\?\<(\w+)>"
                     , "(?<$1" . a_index . ">")
                  , %var%
                  , position)  + strlen(%var%) )
         break
      else {
         tmps .= %var%Regex%a_index% := %var% . spacer
         ++count
      }
   ;%var% := save$
   %var% := tmps
   %var%0 := count
   return count
 } 



hope this helps!


oh... is about 2-3 times slower than directly doing loop:

Code:
pos = 0
loop
   if !( pos := regexmatch(test, "(?<=is)\s*a\s*(?<match" . a_index .  ">[a-z]++)\s*(?<number" . a_index . ">\d+)",$,pos + 1))
      break


(as is doing 2 regex's!!)

but..

regex is FAST...

and larger the haystack, lower the time differential..


"S" option degrades performance...

------------------------------------------

ok and a second, FASTER (3x) version,

works also with unnamed match sections "(xxx)"

but less friendly output style.

$1_match#orid

eg:

$1_1
$1_2

$2_1
$2_2


etc..


Code:
RegExMatchG(haystack,needle,var = "$",spacer = "", position = 1) {
   global
   local tmps, count
   ; local save$ := %var%
   loop
      if !(position := regexmatch(haystack, needle
               , %var%%a_index%_
               , position) 
               + strlen(%var%%a_index%_) )
         break
      else {
         tmps .= %var%%a_index%_ . spacer
         ++count
      }
   ;%var% := save$
   %var% := tmps
   %var%0 := count
   return count
 } 
 



for example:

Code:

loop 100
test .= "this is a match 2345,"

RegExMatchG(test,"(?<=is)\s*(a)\s*(?<match>[a-z]++)\s*(?<number>\d+)")
msgbox % benchmark()  "-" $1_1 "-" $1_match "`n" $0 "-" $

_________________
Joyce Jamce


Last edited by Joy2DWorld on Sun Sep 30, 2007 5:47 pm; edited 1 time in total
Back to top
View user's profile Send private message
Titan



Joined: 11 Aug 2004
Posts: 5007
Location: imaginationland

PostPosted: Mon Aug 27, 2007 8:37 pm    Post subject: Reply with quote

Your proposals look good. I will try to update my scripts soon, in the mean time users can copy your version.
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
px
Guest





PostPosted: Sun Sep 02, 2007 6:44 am    Post subject: Reply with quote

Code:

RegExMall(haystack,needle,var = "$",spacer = "", position = 1) {
   global
   local tmps, count
   ; local save$ := %var%
   loop


count should also be in local var else the 2nd time u accumulate it, it will always be incremented since u didnt reset the count.
Back to top
Titan



Joined: 11 Aug 2004
Posts: 5007
Location: imaginationland

PostPosted: Fri Sep 28, 2007 9:47 pm    Post subject: Reply with quote

In version 2.0 RegExMatchAll() has been replaced with grepcsv(). Details are in the script.
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
Joy2DWorld



Joined: 04 Dec 2006
Posts: 386
Location: Galil, Israel

PostPosted: Sun Sep 30, 2007 5:46 pm    Post subject: Reply with quote

px wrote:
count should also be in local var else the 2nd time u accumulate it, it will always be incremented since u didnt reset the count.


thanks.
_________________
Joyce Jamce
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group