Facebook Twitter

View New Content

Javascript Disabled Detected

You currently have javascript disabled. Several functions may not work. Please re-enable javascript to access full functionality.

Regular Expressions (RegEx) for AutoHotkey

Started by Chris , Sep 23 2006 12:44 PM

Page 5 of 8
3
4
5
6
7

Please log in to reply

112 replies to this topic

Chris

Administrators
10727 posts

Last active:
Joined: 02 Mar 2004

where [does a function's] code reside

A function has no machine code at all because AutoHotkey is an interpreted language. So a function just points to some lines of text (which have been partially pre-interpreted so that the interpreter can run them faster).

Because of this, there's no way for the OS to directly call a function inside a script.

#61

- Posted 12 October 2006 - 11:55 AM

Chris

Administrators
10727 posts

Last active:
Joined: 02 Mar 2004

I've added a new poll to help choose the name of the RegEx functions. I think the advantages of InStrRE() or InStrReg() is that they emphasize that the function is like InStr (haystack comes before needle, and the return value is the found position [0 if not found]). However, RegMatch() or RegExMatch() might be more familiar to people used to PHP and other languages (in which case, perhaps the needle parameter should come before haystack to match PHP).

Thanks for voting.

#62

- Posted 12 October 2006 - 12:06 PM

PhiLho

Moderators
6850 posts

Last active: Jan 02 2012 10:09 PM
Joined: 27 Dec 2005

I voted other, because I prefer RegExInStr for reasons explained above.
Otherwise, I lean toward RegExMatch and RegExReplace, as I used in my signature and started to implement. So, I have still a strong likeness for RE or RegEx as prefix.

In all cases, avoid abbreviating to Reg, it will be confusing with RegRead, RegWrite and RegDelete...

And keep InStr parameter order, I guess there are not so much PHP programmers amongst the AutoHotkey users, so trying so mimic some other language (why not JavaScript, Java, or some other?) might be an error and source of confusion.
PHP list of functions isn't something one wants to take as model, everybody contributed to it, nobody tried to give it some consistence, so we have varying name conventions and parameter orders.
And I usually try to stick to putting the object which we will process first, non unlike the object oriented convention: obj.DoSomething(...), which is equivalent to DoSomething(obj, ...), obj being represented as 'self' auto parameter if I recall correctly my C++ ('this' in Java).

#63

- Posted 12 October 2006 - 12:26 PM

vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

majkinetor

Moderators
4512 posts

Last active: May 20 2019 07:41 AM
Joined: 24 May 2006

yes you recall correctly.

2 Chris

function has no machine code at all because AutoHotkey is an interpreted language. So a function just points to some lines of text (which have been partially pre-interpreted so that the interpreter can run them faster).

Ah, ok, you said once it is compiled some way so I didn't know how exactly. I understand now that this "compilation" is acctually more convenient format for interpreter for performance reasons.

So, the only possible solution as I see it to use PhiLhos suggestion. Of course, paramters to callback are problematic... some ASM expertise is needed. Not too problematic I think....

Thanks.

BTW, I voted for 1st. I like function to be categorised by starting letters, like FILExxxx, Regxxxx etc... I often have a problem finding functions that are not in this naming format. As it is true for most AHK commands, I don't see why would we change that now.

#64

- Posted 12 October 2006 - 12:44 PM

Chris

Administrators
10727 posts

Last active:
Joined: 02 Mar 2004

Thanks for the comments.

I voted other, because I prefer RegExInStr for reasons explained above.
Otherwise, I lean toward RegExMatch and RegExReplace

Thanks, I moved your vote into a new category.

More comments and votes are welcome from anyone.

#65

- Posted 12 October 2006 - 10:28 PM

JSLover

Members
920 posts

Last active: Nov 02 2012 09:54 PM
Joined: 20 Dec 2004

Vote: Reg(Ex)Match and Reg(Ex)Replace...but by that I mean...RegExMatch/RegExReplace not RegMatch/RegReplace...perhaps just...Match/Replace?...then no one gets confused by the prefixes Reg & RegEx...& maybe you could make the new Match/Replace support a non-regex mode...make them the #v2 versions of InStr/StringGetPos/StringReplace...phase out the old names in preference of a new generic Match/Replace for both RegEx's & normal strings...introduce them now (with the old) & not accept the old when #v2 comes...BUT so people aren't confused by 300 StringReplace functions...clearly mark the docs as THIS IS OLD, USE THIS INSTEAD...people are confused when you can do something 2 or 3 ways, they need to be told..."this is the old way, don't use it"...perhaps remove old functions from the index & add a new entry "old functions" that lists all the old supported functions that no one should use anymore...

haystack...needle...

...I've always been confused by the haystack/needle terminology...I'm not sure why tho, because that is the "funny" (or advanced) way to put it, but this might be an example of my confusion...

StringGetPos, OutputVar, InputVar, SearchText [, L#|R#, Offset]
Position := InStr(Haystack, Needle [, CaseSensitive?, StartingPos])

...when you introduced functions, InStr was supposed to be the function equiv of StringGetPos, but the params don't match up, they have different names...I'd say pick one terminology & use it everywhere, even in the docs for the old StringGetPos...

(haystack comes before needle...

...I've never thought about it, but if needle is the regex, then it should be before...JavaScript match is...well object oriented...(I was thinking I always put the regex as the 1st param of match(), but the searched string is not included in the function, because it's object oriented)...

'[color=blue]<haystack/InputVar>[/color]'.match('[color=blue]<needle/SearchText>[/color]')

...I guess if PHP/Perl/sed all put the regex 1st, then perhaps it should be 1st, cuz non-advanced users aren't going to need to deal with regex's & advanced users could almost copy/paste PHP into AHK & it would work...wait a minute RegExMatch isn't in PHP??? It's preg_match...who said it could be "RegExMatch like in PHP"?

#66

- Posted 14 October 2006 - 11:19 AM

Useful forum links: New content since: Last visit ■ Past week ■ Past 2 weeks (links will show YOUR posts, not mine)

OMFG, the AutoHotkey forum is IP.board now (yuck!)...I may not be able to continue coming here (& I love AutoHotkey)...I liked phpBB, but not this...ugh...

Note...
I may not reply to any topics (specifically ones I was previously involved in), mostly cuz I can't find the ones I replied to, to continue helping, but also just cuz I can't stand the new forum...phpBB was soo perfect. This is 100% the opposite of "perfect".

I also semi-plan to start my own, phpBB-based AutoHotkey forum (or take over the old one, if he'll let me)
PM me if you're interested in a new phpBB-based forum (I need to know if anyone would use it)How (or why) did they create the Neil Armstrong memorial site (neilarmstronginfo.com) BEFORE he died?

Chris

Administrators
10727 posts

Last active:
Joined: 02 Mar 2004

perhaps just...Match/Replace?...then no one gets confused by the prefixes Reg & RegEx...& maybe you could make the new Match/Replace support a non-regex mode...

I see that you're trying for a "reduced instruction set", which I agree can improve a language. However, overloading a function too much can cause more problems than its worth. I think that's true in this case because it would be probably be cumbersome to somehow flag a string that isn't a RegEx so that it falls back to non-RegEx mode. In addition, InStr is a very nice name and it's easy to use; so I think it's best to keep it.

people are confused when you can do something 2 or 3 ways, they need to be told..."this is the old way, don't use it"...perhaps remove old functions from the index & add a new entry "old functions" that lists all the old supported functions that no one should use anymore...

I think for v2, some commands will be phased out such as the commands StringGetPos and StringLen. If so, there will probably be a translator program that converts old scripts to the new format (automatically replacing obsolete commands with function equivalents). However, most commands would not be removed; it will just be the ones that are obviously more friendly as functions.

...when you introduced functions, InStr was supposed to be the function equiv of StringGetPos, but the params don't match up, they have different names...I'd say pick one terminology & use it everywhere, even in the docs for the old StringGetPos...

Yeah I should probably make time to go through all of that. But maybe it's best to work on v2 first so that it doesn't get delayed.

...I've never thought about it, but if needle is the regex, then it should be before [haystack]

That's a good point, but I think I slightly prefer it with Haystack first. Also, Microsoft .NET also does it that way in their non-object-oriented variants of their RegEx functions.

wait a minute RegExMatch isn't in PHP??? It's preg_match...who said it could be "RegExMatch like in PHP"?

Well, except for the underscore, the names are quite similar.

#67

- Posted 14 October 2006 - 06:47 PM

Chris

Administrators
10727 posts

Last active:
Joined: 02 Mar 2004

If anyone has spare time to try it out and give feedback, here's a beta-test version of the RegEx functions: http://www.autohotke...eyRegExTest.exe
You can rename it to overwrite your existing AutoHotkey.exe if you want, since it's only the RegEx functions that are beta.

Here's the syntax for the two functions:

FoundPos := RegExMatch(Haystack, NeedleRegEx, OutputVarOrArray = "", StartingPos = 1)

NewStr := RegExReplace(Haystack, NeedleRegEx, Replacement = "", OutputVarCount = "", Limit = -1, StartingPos = 1)

Upon failure (other than "no match"):
- Both functions return "".
- ErrorLevel is set to something non-zero (a message or code, some of which might change).

OutputVar/Array should be unquoted to work (or pass "" to explicitly indicate no output). RegExMatch() creates an array only when there are subpatterns (i.e. parts in parentheses) in the RegEx. Regardless of whether an array is created, the function also stores the substring that matched the entire pattern in ArrayName itself.

The RegEx options (if any) are included at the start of the RegEx/Needle parameter, followed by an open-parenthesis. For example, the RegEx "i)abc". Would search for abc in the caseless mode. Options consist of zero or more of the letters from http://php.net/manua... ... ifiers.php
In addition, the option `n switches from the default end-of-line character (`r`n) to `n.
In addition, RegExMatch() supports the letter P, which causes the output array (if specified) to be split into two arrays: ArrayNamePos and ArrayNameLen (but if there are no subpatterns, no arrays are created). Regardless of whether arrays are created, the function also stores the length of the substring that matched the entire pattern in ArrayName itself.

RegExReplace()'s Replacement parameter supports backreferences in the form $1, ${11}, or ${named}. To use a literal $, specify $$.

Naming: The function names could still be changed to use a prefix of RE_ or Reg instead of RegEx (might make them easier to type). However, as PhiLho point out, those are more ambiguous (such as confusion with the registry commands). There's a new poll for this here.

There's still a lot of testing to do. In addition, the performance will be improved via a cache (currently there's no caching).

Finally, here's a test script that runs though some simple RegEx's. It's by no means comprehensive, and many more tests will be done before the final release (probably in 3 to 5 days).

Comments and bug reports are wecome. Thanks to everyone for your advice.

VarSetCapacity(bigstr, 200000)  ; For performance
Loop 10000
	bigstr = %bigstr%0123`r`nabc`r`n789`r`n
StringReplace, bigstr_repl, bigstr, abc`r`n, XYZ456`r`n, UseErrorLevel
ReplCount := ErrorLevel  ; Used later below.

newstr := RegExReplace(bigstr, "^a.c$", "XYZ456", count)  ; Not found because multiline option is absent.
if count <> 0
	MsgBox Count %count% should have been 0.
if (newstr <> bigstr)
	MsgBox newstr should have been the same as bigstr.
newstr := RegExReplace(bigstr, "m)^a.c$", "XYZ456", count)  ; Match found due to 'm' option.
if (count <> ReplCount)
	MsgBox Count %count% should have been %ReplCount%.
if (newstr <> bigstr_repl)
	MsgBox newstr should have been the same as bigstr_repl.
newstr := RegExReplace(bigstr, "m)^[0-3]*`r`na.c`r`n[7-9]*`r`n", "", count) 
if (newstr <> "")
	msgbox newstr was supposed to be empty.
if count <> 10000
	msgbox count was supposed to be 10000

; TEST REPLACE():
testR(1, "", "", "", "")  ; Pretty obscure, but 1 does seem to be the correct number of replacements.
testR(3, "xxx", "abc", ".", "x")
testR(2, "xx", "abc", ".*", "x")  ; Confirmed correct by http://www.regextester.com. Explanation? Replaces abc by x, then the empty string at the end with x. 
testR(1, "x", "abc", ".*", "x", 1)
testR(5, "bbbbbbbbbb", "aaaaa", "a", "bb")  ; Replace small with larger.
testR(5, "aaaaa", "bbbbbbbbbb", "bb", "a")  ; Converse.
testR(5, "bbbbb", "aaaaa", "a", "b")
testR(3, "bbbaa", "aaaaa", "a", "b", 3)  ; Limit the number of replacements.
testR(0, "aaaaa", "aaaaa", "a", "b", 0)
testR(4, "aaabaca", "abc", "", "a")  ; Confirmed correct by http://www.regextester.com
testR(1, "azc", "abc", "b", "z")
	; TEST PCRE_NEWLINE_LF, PCRE_MULTILINE, and related
testR(0, "123`r`nabc`r`n789", "123`r`nabc`r`n789", "^[0-9]*$", "xxx")   ; Not found due to anchoring.
testR(2, "xxx`r`nabc`r`nxxx", "123`r`nabc`r`n789", "m)^[0-9]*$", "xxx") ; Found because now anchoring is sees the newlines.
testR(2, "xxx`nabc`nxxx", "123`nabc`n789", "m`n)^[0-9]*$", "xxx") ; Same but with LF vs. CRLF.
testR(2, "xxx`rabc`rxxx", "123`rabc`r789", "m`r)^[0-9]*$", "xxx") ; Same but with CR vs. CRLF.
	; TEST THINGS THAT AREN'T QUITE BACKREFERENCES
testR(1, "abc$", "abc", "abc", "abc$")
testR(1, "abc$", "abc", "abc", "abc$$")
testR(1, "abc${}${5}${", "abc", "abc", "abc$${}${$}$${5}${")
testR(1, "a$xbc${xx", "abc", "abc", "a$xbc${xx")  ; Unclosed braces are transcribed literally.
testR(1, "abc", "abc", "abc", "abc${-5}")  ; Negative or out of bounds treated as blank.
testR(1, "abc", "abc", "abc", "abc${99}")  ; Same.
	; TEST NUMBERED BACKREFERENCES
testR(1, "abcabc", "abc", "abc", "abc${0}")
testR(1, "xyz123abcx", "abc123xyz", "([a-z]+)([0-9]+)([a-z]+)", "$3${2}$1$9${77}x")
testR(1, "PhiLho", "Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")
	; TEST NAMED BACKREFERENCES
testR(1, "123|badc|89", "123abcd89", "(a)(?P<x>b)(c)(?P<y>d)", "|${x}$1${y}$3${bogus}|")
	; TEST OPTION 's' (PCRE_DOTALL) (note: CRLF requires two dots to match when dotall is in effect).
testR(1, "abc123x", "abc123`r`ndef", "s)..def", "x")  ; With dot-all
testR(0, "abc123`r`ndef", "abc123`r`ndef", "..def", "x")  ; Without it (not found, so not replacment).


; TEST MATCH()
	; TEST BORDERLINE/SPECIAL CASES
testM(1, "", "", "")  ; Empty string found in itself at pos 1.
testM(1, "", "abc", "")  ; Empty string found at pos 1.
testM(3, "c", "abc", "[a-z]+", 3)
testM(0, "", "abc", "[a-z]+", 5)   ; Test StartingPos greater than length of string.
testM(4, "", "abc", "", 5)  ; Finds empty string though?
testM(1, 0, "abc", "P)")  ; Position mode, which should yield 0 for length of main pattern.
	; TEST OPTIONS PARSING, BORDERLINE CASES
testM(1, "abc", "abc", ")abc")  ; Empty string found in itself at pos 1.
testM(4, "i)abc", "abci)abc", "i[)]abc")  ; Empty string found in itself at pos 1.
testM(4, "i)abc", "abci)abc", "i\)abc")  ; Empty string found in itself at pos 1.
	; TEST OPTION 'i' (PCRE_CASELESS)
testM(4, "aBc", "123aBc789", "i)abc")
testM(0, "", "123aBc789", "abc")  ; Counterpoint to above.
testM(4, "3", "123aBc789", "iP)abc")  ; Position mode.
	; TEST OPTION 'm' (PCRE_MULTILINE)
testM(6, "abc", "123`r`nabc`r`n789", "m)^abc$")
testM(0, "", "123`nabc`r`n789", "^abc$")  ; Counterpoint to above (i.e. no multiline)
testM(6, "abc", "123`r`nabc`r`n789", "m)^abc$", 6)
testM(0, "", "123`r`nabc`r`n789", "m)^abc$", 7)  ; Not found if StartingPos a little too far right.
testM(6, 3, "123`r`nabc`r`n789", "mP)^abc$")  ; Position mode
	; TEST OPTION 's' (PCRE_DOTALL) (note: CRLF requires two dots to match when dotall is in effect).
testM(0, "", "123`r`nabc`r`n789", "123..abc")  ; First with no dot-all.
testM(1, "123`r`nabc", "123`r`nabc`r`n789", "s)123..abc")  ; Same but with dot-all.
testM(1, "123`nabc", "123`nabc`n789", "123.abc")  ; Now with dot-all & LF (works with or without the s) because `n isn't a valid newline char.
testM(1, "7", "123`nabc`n789", "sP)123.abc")  ; Position mode.
	; TEST OPTION 'A' Anchored.
testM(0, "", "123aBc789", "A)aBc")
testM(1, "123", "123aBc789", "A)123")
	; TEST OPTION `n (PCRE_NEWLINE_LF) and related:
testM(1, "123", "123`r`nabc`r`n789", "m)^123$")
testM(1, "123", "123`nabc`n789", "m`n)^123$")
testM(0, "", "123`nabc`n789", "m)^123$")  ; Not found because wrong NEWLINE chars.
testM(0, "", "123`r`nabc`r`n789", "`nm)^123$") ; Same.
testM(1, "123", "123`r`nabc`r`n789", "m)^123$")  ; `r`n now in haystack too
testM(1, "123`t`r", "123`t`r`nabc`r`n789", "m`n)^123`t`r$")  ; Variation.
	; TEST OPTION 'x' (PCRE_EXTENDED) (NO CURRENT TESTS FOR THIS AND OTHER OPTIONS)
	; TEST GENERAL STUFF:
testM(7, "abc`t`r`n789", "123`t`r`nabc`t`r`n789", "abc`t`r`n.*$")
testM(7, "abc`t`r`n789", "123`t`r`nabc`t`r`n789", "abc\t\r\n.*$") ; Same as above but let PCRE escape needle via backslash.
testM(0, "", "123aBc789", "xyz")
testM(10, "aBc", "123aBc789aBc", "aBc$")
testM(6, "c789", "123aBc789", "(xyz)|([a-z]+)7(.)(x*)9", 1, "", "c", "8", "")
testM(6, "4", "123aBc789", "P)(xyz)|([a-z]+)7(.)(x*)9", 1, "0", "0", "6", "1")  ; Position mode.
MsgBox Done


testR(aExpectedCount, aExpectedResult, aHaystack, aNeedle, aRepl, aLimit = -1)
{
	static test_number
	++test_number

	ErrorLevel = Not Initialized  ; To catch bugs where it wasn't properly set by the command.
	actual_result := [color=red]RegExReplace[/color](aHaystack, aNeedle, aRepl, actual_count, aLimit)
	if ErrorLevel
	{
		MsgBox Replace() Test #%test_number%`nErrorLevel = "%ErrorLevel%"`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"`nReplacement = "%aRepl%"
		return  ; Show just one error per test.
	}
	if (actual_result <> aExpectedResult)
	{
		MsgBox Replace() Test #%test_number%`nActual result (%actual_result%) <> expected (%aExpectedResult%).`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"`nReplacement = "%aRepl%"
		return  ; Show just one error per test.
	}
	if (actual_count <> aExpectedCount)
	{
		MsgBox Replace() Test #%test_number%`nActual replacement count (%actual_count%) <> expected (%aExpectedCount%).`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"`nReplacement = "%aRepl%"
		return  ; Show just one error per test.
	}
	if (strlen(actual_result) <> strlen(aExpectedResult))  ; THIS CHECKS INTERNALLY-STORED LENGTH FOR CORRUPTION (but make the above test take precedence in case the length discrepancy is due merely to the two strings not being equal).
	{
		MsgBox Replace() Test #%test_number%`nActual length <> expected length.`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"`nReplacement = "%aRepl%"
		return  ; Show just one error per test.
	}
}


testM(aExpectedPos, aExpectedFoundStr, aHaystack, aNeedle, aOffset = 1
	, aSub1 = -1, aSub2 = -1, aSub3 = -1, aSub4 = -1)
{
	static test_number
	++test_number

	ErrorLevel = Not Initialized  ; To catch bugs where it wasn't properly set by the command.
	FoundPos := [color=red]RegExMatch[/color](aHaystack, aNeedle, match, aOffset)
	if ErrorLevel
	{
		MsgBox Test #%test_number%`nErrorLevel = "%ErrorLevel%"`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"
		return  ; Show just one error per test.
	}
	if (FoundPos <> aExpectedPos)
	{
		MsgBox Test #%test_number%`nFoundPos actual (%FoundPos%) <> expected (%aExpectedPos%).`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"
		return  ; Show just one error per test.
	}
	if not (aExpectedFoundStr == match)
	{
		MsgBox Test #%test_number%`nFoundStr actual (%match%) <> expected (%aExpectedFoundStr%).`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"
		return  ; Show just one error per test.
	}
	if RegExMatch(aNeedle, "[a-zA-z`r`n]*P")  ; Verify the SubN items as though they contain positions.
	{
		v = 1
		Loop 4
		{
			expected := aSub%A_Index%
			if (expected = -1)
				continue
			if mod(A_Index, 2)
				actual := matchPos%v%
			else
			{
				actual := matchLen%v%
				++v
			}
			if (actual <> expected)
				MsgBox Test #%test_number%`nSubstring #%A_Index% actual (%actual%) <> expected (%expected%).`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"
		}
	}
	else  ; Verify the SubN items as though they contain substrings that matched the subpatterns.
	{
		Loop 4
		{
			expected := aSub%A_Index%
			if (expected = -1)
				continue
			actual := match%A_Index%
			if (actual <> expected)
				MsgBox Test #%test_number%`nSubstring #%A_Index% actual (%actual%) <> expected (%expected%).`nHaystack = "%aHaystack%"`nNeedle = "%aNeedle%"
		}
	}
}

#68

- Posted 14 October 2006 - 07:04 PM

JSLover

Members
920 posts

Last active: Nov 02 2012 09:54 PM
Joined: 20 Dec 2004

Naming: The function names could still be changed to use a prefix of RE_ or Reg instead of RegEx (might make them easier to type).

...nooo...I like RegExMatch/RegExReplace...they say what they mean & anyone can create wrapper functions for the rest of the possible names, the official name could be long & in the function library, add the shorter names...I will probably wrap with match() & replace()...

#69

- Posted 14 October 2006 - 07:19 PM

SKAN

Administrators
9115 posts

Last active:
Joined: 26 Dec 2005

To anybody nOOb like me in RegEx :!:

At last I was able to test PhiLho's Signature 8)

vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")
MsgBox, % vPhiLho

It works!

#70

- Posted 15 October 2006 - 04:35 PM

polyethene

Members
5519 posts

Last active: May 17 2015 06:39 AM
Joined: 26 Oct 2012

I voted RegEx over Reg as it eliminates confusion with the registry commands (RegRead, RegWrite, etc.). I would have even preferred RegExp like javascript because the 'Ex' suffix is used a lot in the WinAPI functions to stand for 'an extended version' (if I'm not mistaken). Like JSLover said, anyone can make wrapper functions, and I'll be using replace()/match() so I don't mind.

#71

- Posted 15 October 2006 - 05:01 PM

autohotkey.com/net Site Manager

Contact me by email (polyethene at autohotkey.net) or message tidbit

SKAN

Administrators
9115 posts

Last active:
Joined: 26 Dec 2005

I voted for RegExMatch & RegExReplace
A google search for "RegEx" returns 17,600,000 hits :shock:

...

#72

- Posted 15 October 2006 - 05:49 PM

Chris

Administrators
10727 posts

Last active:
Joined: 02 Mar 2004

Thanks for the votes. Although I admire the specificity of "RegExp", I don't think it would be a serious contender in a poll. Even Microsoft opted to use "RegEx" in their Dot NET class "RegEx".

I wonder about merging Options into the RegEx itself, like PHP etc. Although I don't prefer that syntax, we could have another poll if anyone thinks it's worthwhile.

#73

- Posted 15 October 2006 - 06:21 PM

JSLover

Members
920 posts

Last active: Nov 02 2012 09:54 PM
Joined: 20 Dec 2004

I wonder about merging Options into the RegEx itself...

...yes...or both...regex's include the flags in them...JavaScript match supports...

'blah'.match(/blah/i)
'blah'.match('blah', 'i')

...it's natural for regexers to expect flags to work there...it's also natural to be able to use some other chars (not in JavaScript but)...for example sed supports s/wow/wee/ or s@wow@[email protected] is a very specific description of what chars can be the delim...but I don't know what it is or where to link...at the very least supporting // & @@ would help, I use @'s when parsing urls to avoid leaning toothpick syndrome...

#74

- Posted 15 October 2006 - 08:19 PM

Chris

Administrators
10727 posts

Last active:
Joined: 02 Mar 2004

...it's natural for regexers to expect flags to work [inside the RegEx itself, via delimiters]...it's also natural to be able to use some other chars [as delimiters]

But without the need for delimiters, there's no need to ever escape them, and thus no need support alternate/custom delimiters. Therefore, I think I'm with PhiLho that they should be avoided. However: Could there be some way to put the options inside the RegEx some other way, such as using a syntax that's normally illegal in a RegEx (and never likely to become legal)? For example, a mismatched closing parenthesis near the beginning could delimit the options:

FoundPos := RegExMatch(Haystack, "i)pattern")

This seems like it would be much nicer than using PERL-style delimiters, while also eliminating the options parameter. If this sounds good to you, is there a better syntax or symbol than using an unmatched closing parenthesis?

Also, I've tentatively chosen linefeed (`n) as the default end-of-line character rather than CRLF (`r`n) because `n seems to provide partial functionality even for lines that end in `r`n (while the converse isn't true). For example, a line ending in `r`n is seen in LF-mode as having an extra `r at the end, which can often be managed by the script by specifying it explicitly such as ^.*`r$. In spite of this advantage, it might still be better to default to CRLF than LF if CRLF is more often used. What do you think?

Thanks.

#75

- Posted 16 October 2006 - 01:47 AM

Page 5 of 8
3
4
5
6
7

Back to Suggestions

Regular Expressions (RegEx) for AutoHotkey

Poll: What should the names of the RegEx functions be (if you HAD to pick one of these)? (42 member(s) have cast votes)

What should the names of the RegEx functions be (if you HAD to pick one of these)?

Regular Expressions (RegEx) for AutoHotkey

Poll: What should the names of the RegEx functions be (if you HAD to pick one of these)? (42 member(s) have cast votes)

What should the names of the RegEx functions be (if you HAD to pick one of these)?

Sign In