grep() - global regular expression match
#1
Posted 29 January 2007 - 01:13 AM
#2
Posted 29 January 2007 - 12:24 PM
Your code is so short I was tempted to put it into the examples section. But I don't understand it well enough to be comfortable with that -- in part due to not fully understanding Grep. So I just linked to this topic, which provides your full examples and makes it easier for you to update it.
Thanks.
#3
Posted 29 January 2007 - 01:16 PM
#4
Posted 12 February 2007 - 12:11 PM
#5
Posted 26 May 2007 - 05:17 PM
t = C:\Windows\system32\systeminfo.exe`n
s = `n ; separator character between matches, like "|" or ","
p = (?<=([\\\.]))\w+ ; pattern to search for
t := RegExReplace(t, "(.*?)((" . p . ")|$)", "$2" . s)
StringTrimRight t, t, SubStr(t,-2,1) = s ? 3 : 2
MsgBox [%t%]If you want multi char separators, only the StringTrimRight command needs adaptation.
#6
Posted 26 May 2007 - 05:49 PM
#7
Posted 28 June 2007 - 09:46 PM
example:
start := (matched := 0) + 1
loop
if regexmatch("12345","(?<whole>.*?(?<ANSWER" . a_index . ">\d))",The_,start) and ++matched
start += strlen(The_whole)
else
break
msgbox % matched
loop % matched
msgbox % The_Answer%A_index%
exitappif this is helpful(?)
ie. regexmatch(HAYSTACK, "(?<whole>" ... rest of match block ... ")",Array_Var_Name, PlaceHolder)
the match block is normal except... in place of (?<named>) use "(?<named" . a_index . ">" for each named match.
or:
if regexmatch(Haystack,"sx) (?<whole>.*? # .*? ie. if there can be optional text between matches (?<ANSWER" . a_index . "> # name of section + index counter \d) # for each section you want to capture )" ; Closing paren for entire match section ,Array_,start)
#8
Posted 28 June 2007 - 09:55 PM
#9
Posted 29 June 2007 - 12:27 AM
grep(Haystack, Needle, ByRef outputVar, whichmatchB = 0, positionS = 1, charhopefullyNOTinanymatcheD = ",", matchfromlastZ = true) {
Loop
If positionS := RegExMatch(Haystack, Needle, X, PositionS) {
outPut .= positionS . charhopefullyNotinanymacheD
positionS += matchfromlastZ ? StrLen(X) : 1
Y .= (whichmatchB ? X%whichmatchB% : X) . charhopefullyNotinanymacheD
} Else {
outputVar := SubStr(Y, 1, -1)
Return SubStr(outPut, 1, -1)
}
}
we see your very cool use of the return value (foundposition), but some huge LIMITATIONS with the function:
1) only return one (1) match, not multiple arrayed matches...... [with the simple loop suggested above, can make as many named matches/submatches as desired... ], eg:
start := (matched := 0) + 1
loop
if regexmatch("12345","x)(?<whole>.*?(?<ANSWER" . a_index . ">\d)(?<AnswerB" . a_index . ">\d))",The_,start) and ++matched
start += strlen(The_whole)
else
break2) need a 'pray it isn't in the match result' character to seperate the matches!!!
3) don't return an array....
4) are much slower...
and notably,
as pointed out above in thread, to create a conjoined string with matches, simply use regexreplace to kill everything between matches and insert whatever 'between' string you desire......
#10
Posted 29 June 2007 - 12:50 AM
Which could be any one. I'll try see if this limitation can be lifted with a new paradigm.only return one (1) match, not multiple arrayed matches
That's true, I usually escape my commas prior to calling grep. Until real arrays/objects are supported AutoHotkey will always have a problem here.need a 'pray it isn't in the match result' character to seperate the matches
In most cases this is better. You can transform the string with Sort, parsing loops and not worry about variable scope within functions.don't return an array
I did a few tests, and I found that it was only ever slightly slower (1-2%). Like I said this function is for convenience, looping is not a new concept. If performance is critical you can write a Dll in ASM and call your exported functions.much slower
In my follow up I listed a few reasons why it's not a practical solution - breaking backreferences is a major worry because I use them a lot.use regexreplace to kill everything between matches
#11
Posted 15 August 2007 - 08:44 AM
so easy, even pre-beginner can use !!!
STARTER:
use it like this,
RegexMall( HAYSTACK, NEEDLE)
get back
for *each* (?<named_match>xxxxxx)
ie.. "(?<first>\d+)\D+(?<second>\d*?)\s*(?<third>\w+)" is legal.
matched result contained in named matches, eg:
$named_match1 - 9999
function returns # of matches
also:
$ has full matches
$RegEx1-999 has full match for that match #
$0 also has # of matches...
options:
RegexMall(haystack,Needle, "VARIABLE TO USE", "Spacer to use", Position to start in regexreplace)
spacer applies to TOTAL MATCH return string... eg. "$" (if no variable alternative designated)
RegexMall(hay,needle,"THIS_") ->
returns
THIS_namedmatch1
THIS_namedmatch2
etc..
ie. in place of default "$"
/*
; example
msgbox % RegexMall("test and more ! and test and more and more!! ","(?<happy>test).*?(?<more>more)","Yes_") "- " Yes_happy1 " " Yes_more2 "`n" Yes_ "-" Yes_0
*/
RegExMall(haystack,needle,var = "$",spacer = "", position = 1) {
global
local tmps, count
; local save$ := %var%
loop
if !(position := regexmatch(haystack
, regexreplace(needle,"(?<!\\)\(\?\<(\w+)>"
, "(?<$1" . a_index . ">")
, %var%
, position) + strlen(%var%) )
break
else {
tmps .= %var%Regex%a_index% := %var% . spacer
++count
}
;%var% := save$
%var% := tmps
%var%0 := count
return count
}hope this helps!
oh... is about 2-3 times slower than directly doing loop:
pos = 0 loop if !( pos := regexmatch(test, "(?<=is)\s*a\s*(?<match" . a_index . ">[a-z]++)\s*(?<number" . a_index . ">\d+)",$,pos + 1)) break
(as is doing 2 regex's!!)
but..
regex is FAST...
and larger the haystack, lower the time differential..
"S" option degrades performance...
------------------------------------------
ok and a second, FASTER (3x) version,
works also with unnamed match sections "(xxx)"
but less friendly output style.
$1_match#orid
eg:
$1_1
$1_2
$2_1
$2_2
etc..
RegExMatchG(haystack,needle,var = "$",spacer = "", position = 1) {
global
local tmps, count
; local save$ := %var%
loop
if !(position := regexmatch(haystack, needle
, %var%%a_index%_
, position)
+ strlen(%var%%a_index%_) )
break
else {
tmps .= %var%%a_index%_ . spacer
++count
}
;%var% := save$
%var% := tmps
%var%0 := count
return count
}
for example:
loop 100 test .= "this is a match 2345," RegExMatchG(test,"(?<=is)\s*(a)\s*(?<match>[a-z]++)\s*(?<number>\d+)") msgbox % benchmark() "-" $1_1 "-" $1_match "`n" $0 "-" $
#12
Posted 27 August 2007 - 07:37 PM
#13
px
Posted 02 September 2007 - 05:44 AM
RegExMall(haystack,needle,var = "$",spacer = "", position = 1) {
global
[color=red]local tmps, count[/color]
; local save$ := %var%
loop
count should also be in local var else the 2nd time u accumulate it, it will always be incremented since u didnt reset the count.
#14
Posted 28 September 2007 - 08:47 PM
#15
Posted 30 September 2007 - 04:46 PM
count should also be in local var else the 2nd time u accumulate it, it will always be incremented since u didnt reset the count.
thanks.




