Regex help...

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
drawback
Posts: 34
Joined: 11 Aug 2016, 11:31

Regex help...

20 Oct 2016, 12:09

Hi,

I have the following string:

Code: Select all

EED6-5BA8:\Us;e;rs;D:\Test;Storage:\A;B;C\
I know how to loop over it via RegExMatch but I need a bit of help to define the actual regex pattern...

The string consists of three paths:
1.) EED6-5BA8:\Us;e;rs -> Drive serial number + folder with semicolons in it
2.) D:\Test
3.) Storage:\A;B;C\ -> Volume label + folder with semicolons in it

All paths are separated with a semicolon as well (can't do anything against that!)

The regex should be able to capture each of these 3 paths...

I've tried:

Code: Select all

(^|;)(.*?:.*?(?=;.*?:|$))
But this doesn't work as it should :/ (limiting via .*? doesn't capture enough)

Code: Select all

EED6-5BA8:\Us
;e;rs;D:\Test
;Storage:\A;B;C\
Each path could contain unicode letters (volume label + folders)!
User avatar
evilC
Posts: 4823
Joined: 27 Feb 2014, 12:30

Re: Regex help...

20 Oct 2016, 12:24

Could be quite difficult due to the different ways in which semicolon is used (Both as a field separator and a folder separator - how can it tell the difference?)

Does EED6-5BA8:\Us;e;rs vary? ie are there always the same number of semicolons?

Do you not have any control whatsoever over the format of the input string?
drawback
Posts: 34
Joined: 11 Aug 2016, 11:31

Re: Regex help...

20 Oct 2016, 12:35

Could be quite difficult due to the different ways in which semicolon is used (Both as a field separator and a folder separator - how can it tell the difference?)
The difference should be solvable by a positive lookahead (at least that was my plan...)
Does EED6-5BA8:\Us;e;rs vary?
Ofc. It could be anything (drive letter, serial number, volume label:\folder with <x> numbers of semicolons

Do you not have any control whatsoever over the format of the input string?
Unfortunately no, sorry!
User avatar
evilC
Posts: 4823
Joined: 27 Feb 2014, 12:30

Re: Regex help...

20 Oct 2016, 13:28

Code: Select all

str := "EED6-5BA8:\Us;e;rs;D:\Test;Storage:\A;B;C\"

RegexMatch(str, "^(.+:\\.+);(.+:\\.+);(.+:\\.+)$", out)
msgbox % out1 "`n" out2 "`n" out3
EED6-5BA8:\Us;e;rs
D:\Test
Storage:\A;B;C\

Basically I used :\ to anchor each capture pattern
ahcahc
Posts: 110
Joined: 25 Jul 2014, 23:55

Re: Regex help...

20 Oct 2016, 13:44

try

Code: Select all

text = EED6-5BA8:\Us;e;rs;D:\Test;Storage:\A;B;C\
while pos := regexmatch(text,"m)(?:[~!@#$%^&()_+`\-=\[\]{}'\.,\w]+):\\(?:[~!@#$%^&()_+`;\-=\[\]{}'\., \w]+(?:(?=;)|\\|$))*",m,a_index=1?1:pos+strlen(m))
   MsgBox % m
drawback
Posts: 34
Joined: 11 Aug 2016, 11:31

Re: Regex help...

20 Oct 2016, 13:59

@evilC: Thank you!, but I guess I didn't describe it correctly :(

This string can consist of any combination (and numbers!) of serial number / volume label / drive letter :\ [<path with ; in it]
So these would be all "valid" strings that could occur:
EED6-5BA8:\Us;e;rs;D:\Test;Storage:\A;B;C\
Storage:\
C:\a ; (semicolon) inside me\subfolder with 漢 in it;Windows:\Users\
etc.
Sorry If I my description was misunderstandable!


@ahcahc
Thank you! This is very close. It splits all entries correctly unless a non-english character appears. E.g. a chinese char in a folder name
Like:
EED6-5BA8:\Us;e;rs;D:\@Chinese-漢字-chars;acter UTF-8 BOM\Test;Storage:\A;B;C\
Where the second entry comes out as 'D:\' only, every other char for that entry was truncated...
The .ahk script is using UTF-8 BOM, but I tried it with UTF-16 BOM / No BOM as well.
ahcahc
Posts: 110
Joined: 25 Jul 2014, 23:55

Re: Regex help...

20 Oct 2016, 14:33

try ^[^:\\]+?:\\(?:[^:\\]+?(?:(?=;)|\\|$))*|(?<=;)[^:\\]+?:\\(?:[^:\\]+?(?:(?=;)|\\|$))* maybe needs more testing.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Regex help...

20 Oct 2016, 14:36

drawback wrote:@evilC: Thank you!, but I guess I didn't describe it correctly :(
This string can consist of any combination (and numbers!) of serial number / volume label / drive letter :\ [<path with ; in it]
This creates a evilC-style regex of "correct length":

Code: Select all

RegExReplace(str,":",,n)
regex:="^"
Loop, % n
	regex.="(.+:\\.+);"
regex:=RTrim(regex,";") "$"
RegexMatch(str, regex, out)
drawback
Posts: 34
Joined: 11 Aug 2016, 11:31

Re: Regex help...

20 Oct 2016, 15:07

@ahcahc
Thanks a lot, works!

@HelgeF
Very... evil! :)

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Google [Bot], Joey5, RandomBoy and 354 guests