extracting items from a list using RegEx (various methods) (get all matches)

Get help with using AutoHotkey and its commands and hotkeys
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

extracting items from a list using RegEx (various methods) (get all matches)

11 Apr 2017, 19:16

I've seen a fair number of posts on this topic. Extracting items from a list using RegEx has proved surprisingly difficult.

Before I add anything to my tutorial, I thought I'd collect some examples, and see if anyone had anything interesting to say on the subject.

RegEx handy examples (RegExMatch, RegExReplace) - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?t=28031

4 methods: 3 that work, and 1 that looks promising but doesn't work:
method 1 (fails) - no clever handling for a single needle that appears more than once in the haystack
method 2 - repeat the needle multiple times
method 3 - while loop
method 4 - RegEx callout

Code: Select all

q:: ;tests on extracting items from a list using RegEx
vText := "red,yellow,green,blue"

;METHOD 1 (fails) - no clever handling for a single needle
;that appears more than once in the haystack
;fails, only gets the first result
RegExMatch(vText, "O)\b(\w+)\b", o)
MsgBox, % "[1] " o.1 " " o.2 " " o.3 " " o.4 ;red

;METHOD 2 - repeat the needle multiple times
;works, but depends on the fact we knew there would be 4 items
RegExMatch(vText, "O)(\w+),(\w+),(\w+),(\w+)", o)
MsgBox, % "[2a] " o.1 " " o.2 " " o.3 " " o.4 ;red yellow green blue

;fails, if we add in extra items
RegExMatch(vText, "O)(\w+),(\w+),(\w+),(\w+),(\w+)", o)
MsgBox, % "[2b] " o.1 " " o.2 " " o.3 " " o.4 ;(blank)

;works, if we add in lots of question marks
RegExMatch(vText, "O)(\w+)?,?(\w+)?,?(\w+)?,?(\w+)?,?(\w+)?", o)
MsgBox, % "[2c] " o.1 " " o.2 " " o.3 " " o.4 ;red yellow green blue

;works, if we add in lots of question marks
;(with added StrReplace, so we don't have to look at that needle)
vNeedle := "O)(\w+),(\w+),(\w+),(\w+),(\w+)"
vNeedle := StrReplace(vNeedle, "),(", ")?,?(") "?"
RegExMatch(vText, vNeedle, o)
MsgBox, % "[2d] " o.1 " " o.2 " " o.3 " " o.4 ;red yellow green blue

;METHOD 3 - while loop
vPos := 1, o := []
while (vPos := RegExMatch(vText, "O)(\w+)(,|$)", oTemp, vPos))
	o[A_Index] := oTemp.1, vPos += StrLen(oTemp.1)
MsgBox, % "[3] " o.1 " " o.2 " " o.3 " " o.4 ;red yellow green blue

;METHOD 4 - RegEx callout
o := []
RegExMatch(vText, "(^|,)(\w+)(,|$)(?CMyCallout)")
MsgBox, % "[4] " o.1 " " o.2 " " o.3 " " o.4 ;red yellow green blue
return

MyCallout(vMatch)
{
	global o
	o[o.MaxIndex() ? o.MaxIndex()+1 : 1] := vMatch2
	return 1
}
PS Some of these issues came up for me personally when I was investigating:
Cmd, Arg1, Arg2, Arg3, ...
to
Func(Arg1, Arg2, Arg3, ...)
When you couldn't be sure of the number of items in advance.
Guess what I was working on.

(Post 666: better the RegEx you know, than the RegEx you don't.)
Last edited by jeeswg on 08 May 2017, 14:52, edited 1 time in total.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
evilC
Posts: 4765
Joined: 27 Feb 2014, 12:30

Re: extracting items from a list using RegEx (various methods)

12 Apr 2017, 07:58

AHK's regex engine does not support "global", which is what will return multiple matches for one capture group.

This can be done with grep()
Unfortuantely, maybe the link is dead?

Failing that, a method could be written that uses the match position to execute the regex multiple times, and chop off the part that has already been processed.
User avatar
evilC
Posts: 4765
Joined: 27 Feb 2014, 12:30

Re: extracting items from a list using RegEx (various methods)

12 Apr 2017, 08:14

Code: Select all

q:: ;tests on extracting items from a list using RegEx
vText := "red,yellow,green,blue"

o := GlobalMatches(vText, "\b(\w+)\b")
MsgBox, % "[1] " o.1 " " o.2 " " o.3 " " o.4 ;red

return

GlobalMatches(text, regex){
	matches := []
	Loop {
		pos := RegExMatch(text, regex, o)
		if (pos){
			len := StrLen(o1)
			text := SubStr(text, pos + len)
			matches.push(o1)
		} else {
			break
		}
	}
	return matches
}
IMEime
Posts: 750
Joined: 20 Sep 2014, 06:15

Re: extracting items from a list using RegEx (various methods)

12 Apr 2017, 10:58

pls delete this one. sorry...
Last edited by IMEime on 12 Apr 2017, 11:59, edited 2 times in total.
FanaticGuru
Posts: 1380
Joined: 30 Sep 2013, 22:25

Re: extracting items from a list using RegEx (various methods)

12 Apr 2017, 12:15

jeeswg wrote:method 3 - while loop

Code: Select all

;METHOD 3 - while loop
vPos := 1, o := []
while (vPos := RegExMatch(vText, "O)(\w+)(,|$)", oTemp, vPos))
	o[A_Index] := oTemp.1, vPos += StrLen(oTemp.1)
MsgBox, % "[3] " o.1 " " o.2 " " o.3 " " o.4 ;red yellow green blue
I use the while loop method all the time. You can do the add to vPos in the RegExMatch and shorten a little bit and also keep all the looping work in the one while line.

Code: Select all

Haystack := "stuff here XthereX and every XwhereX or YhereYand don't forget ZthisZ"
X:=1
while (X := RegExMatch(Haystack, "U)(?|X(.*)X|Y(.*)Y|Z(.*)Z)", M, X+StrLen(M)))
    MsgBox % M1
This is an example of a more complex or needle looking for any text between XX, YY, or ZZ but it still shows the looping technique I prefer.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts

AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon

[Function] Timer - Create and Manage Timers

Return to “Ask For Help”

Who is online

Users browsing this forum: au6, BushMange, howardb1, MannyKSoSo, w0z and 185 guests