It can be still of interest for educational purpose... ;-)
OK, I see half of the readers (uh? two-thirds? ninety percent?) asking "What are regular expressions?".
Well, I won't provide full explaination here, I started to hand-write a tutorial, I have to type that and finish it...
But in few words, regular expressions, or regexp or regex or RE are a powerful (but a bit geeky) way to manipulate text.
With them, you can see if a generic string (eg. "5 letters followed by 2 digits) is inside a text, you can extract this string (eg. getting the current version number from the AutoHotkey download page), check if a string meets some criteria (does the user has typed a date in the right format?), transform a text (morph a list of C's #defines to a list of AHK's variable assignments), split a string with complex requirements (eg. get all words of a natural text, separated by spaces or punctuation signs), etc.
The drawback is its syntax, a bit cryptic for the uninitiated (and sometime for the initiated...), but with practice, it appears that most of the tasks use rather simple expressions.
Currently, AutoHotkey doesn't support regular expressions, so we have to rely on some external DLL. One of the most used is PCRE (Perl Compatible Regular Expressions), which is powerful and can be compiled to a rather small DLL.
Thomas Lauer already provided a wrapper DLL for PCRE 5.0.
It has the advantage of being small and implementing a replace algorithm, since PCRE does only searches.
It has the inconveniences of relying on an old version of this library (but the latest ones are big!), of using only the Posix version of the library, of needing a supplementary wrapper (in AHK) around this DLL, of being rather inefficient by compiling an RE at each of its use, of being difficult to change (need a C compiler), etc.
So, I tried to make my own implementation in pure AutoHotkey using only the official DLL.
Thus, if a new version comes out, you can use it. Or, possibly with some changes, you can use an older, smaller version. You can customize the wrapper to your tastes, since it is pure script.
The replace algorithm might be a bit slower, because I had to write it all in AHK, but you can compensate by adding extra power by hacking these routines.
I provide no split function, because it is inconvenient to write in AutoHotkey, as we cannot return arrays. So either the result would be global, or hard to fetch. But implementing a split with the provided functions should be quite trivial.
The version I release today is a bit geeky, in the sense you don't use the RE strings directly, but you have to compile them before using them.
The advantage is performance: you compile a regular expression once, then reuse it as many time as you like, the library won't need to recompile it again.
The disavantage is that's not much intuitive, not in the spirit of AutoHotkey.
So I am planning to do another version more in the spirit of my signature... or of AHK.
The trade off will be less performance, but it probably won't be noticeable except to parse a very big file line per line... And it will be perfect, for example, for a quick validation of a formatted edit field.
Plus this version might serve as prototype to a future integration of regular expressions in AHK... Note that such implementation can be more performant, perhaps by caching the expressions. If caching (hashing) is much faster than compiling, there might be an advantage. That's the way Perl mangage REs too: it only avoid to cache dynamic expressions (ie. resulting of concatenation or variable expansion, etc.).
It should implement also friendlier options (letters instead of big constant names).
Now it is time to take a look:
PCRE_DLL.ahk
TestPCRE_DLL.ahk
PCRE-6.4.zip, (only) the DLL. You can get other compiled DLLs at GnuWin32 or at Psyon site (untested yet, may be smaller).
As you can see, the test script is becoming big, but it only touch the surface of the library, with simple expressions, no option, no offset.
So there can be bugs there. If you find any, please report them here.
An overview of the usage of the library:
stringToSearch = You can do /Regular Expressions/ in AutoHotkey too! ; Compile regular expression and get a reference to the result hRE := PCRE_RegisterRegExp("R(A|H)(u|o)**") ; There is an error, the handle is null, we can use the provided mini-GUI ; that point out where the error is in the expression (if single line). if (hRE = 0) PCRE_ShowLastError() ; Compile a correct RE hRE := PCRE_RegisterRegExp("([A-Z])([a-z])") ; Get the position of a match on the given string. pos := PCRE_GetMatch(hRE, stringToSearch) ; Get both position and length of the match, in a string, separated by a pipe (|) pos@len := PCRE_GetMatch(hRE, stringToSearch, 0, #PCRE_GETLENGTH) ; Get the matched string match := PCRE_GetMatch(hRE, stringToSearch, 0, #PCRE_GETSTRING) ; Get the first match of this RE on the given string, as a reference for use in further calls hMatch := PCRE_Match(hRE, stringToSearch) If (ErrorLevel = #PCRE_ERROR_NOMATCH) { MsgBox No match! ExitApp } ; Get how many captured string there was in this match: ; number of matched captures, plus the implicit capture of the whole match. n := PCRE_GetMatchedCaptureNumber(hMatch) ; Get position and length of the captures PCRE_GetMatchVals(hMatch, 0, pos0, len0) ; Whole match PCRE_GetMatchVals(hMatch, 1, pos1, len1) ; First capture PCRE_GetMatchVals(hMatch, 2, pos2, len2) ; Second capture ; Get strings of captures s0 := PCRE_GetMatchStr(hMatch, 0) ; Whole match s1 := PCRE_GetMatchStr(hMatch, 1) ; First capture s2 := PCRE_GetMatchStr(hMatch, 2) ; Second capture ; Find next match and update the reference PCRE_MatchNext(hRE, hMatch) ; Similar to: hMatch := PCRE_Match(hRE, stringToSearch, pos0 + len0) ; but the later is less efficient, creating another reference instead of reusing it. ; Replace the whole match(es) by the given string, ; with $n replaced by the nth capture. hRS1 := PCRE_RegisterReplaceString("$2-$1!") ; Idem with user-defined symbol. hRS2 := PCRE_RegisterReplaceString("\_2-\_1!", "\_") ; Idem, two-parts symbol, to avoid ambiguity hRS3 := PCRE_RegisterReplaceStringEx("${2}1-${1}0!") ; Idem, with user-defined symbols hRS4 := PCRE_RegisterReplaceStringEx("\_2_/-\_1_/!", "\_", "_/") ; Note that unlike Perl, you cannot mix both notations. See the test file for more explainations. ; "A" is to replace all occurences (default), can be a maximum number of replacements. resultString := PCRE_Replace(hRE, hRS1, stringToReplace, "A") resultString := PCRE_Replace(hRE, hRS3, stringToReplace, 1) ; This call is optional, it will unload the library (automatically loaded on first use) ; and free the data allocated by the DLL. ; If not called, Windows will free all this on script exit. PCRE_End()