Regular expressions (RegEx): library and wrapper
The zip contains four files. readme is exactly what you think; it also doubles as the C source for the interface functions (you won't be able to do anything other than staring at it, though, as the C code must be compiled with the rest of PCRE. Anyone who wants to go into that can contact me via the forum; I can supply the sources, makefiles etc. so that everything works with the Microsoft C compiler.).
pcreahk.dll is the actual DLL with the PCRE code and my functions. It should copied to a location where AHK can find it. I have it in the AHK directory.
match.ahk and replace.ahk are two examples that demonstrate how to call the DLL functions. They should be used as guidance and not as the final word on how to use PCRE.
Comments are welcome. Have fun.
The zip is here: http://www.autohotke...isc/PCREAHK.ZIP
If you are new to regular expressions, a not half-bad documentation for PCRE is here: http://mushclient.com/pcre/
There are also some tutorials on-line, this one seems to be okay: http://www.regular-expressions.info
toralf
I use the latest AHK version (1.1.15+)
Please ask questions in forum on ahkscript.org. Why?
For online reference please use these Docs.
I have played around with the functions quite a bit now.
However I ran into trouble if the p and/or s contain quotation marks.
In the following example AHK gives this error:
"Error: This variable or function name contains an illegal character." (p and s)
Is there any way to use " in p and/or s?
RE_Replace(ByRef string,offset,pattern,options,replace) { ; IMPORTANT: reserve enough space for the complete replacement string! VarSetCapacity(tempres,1024) tempres:=string ; call the DLL function i:=DllCall("pcreahk.dll\pcre_replace","str",tempres,"int",offset,"str",pattern, "int",options,"str",replace) If (i>0) ; at least one replacement was done, copy result back string:=tempres VarSetCapacity(tempres,0) Return i } hModule:=DllCall("LoadLibrary","str","pcreahk.dll","UInt") p:="left(\s+:\s+)"(.*)"(\s+right)" s:="left : "xyz" right" r:="$2" e:=RE_Replace(s,0,p,0,r) if (e>0) MsgBox result of RE_Replace: "%s%" Else MsgBox Error: %e% DllCall("FreeLibrary","UInt",hModule) Return
To indicate a literal double-quote character inside an expression, specify two consecutive quotes somewhere inside a literal quoted string. For example:Is there any way to use " in p and/or s?
p:="left(\s+:\s+)"(.*)"(\s+right)"
p:="""" ; Assigns a single character (") to p.
p:="left(\s+:\s+)""(.*)""(\s+right)"
Chris has already answered that one; however, here's another thing that may be important for some people. The C functions used to handle the expressions are string functions, meaning that a literal 0 indicates "end of string". The PCRE functions should correctly handle all values other than 0.Is there any way to use " in p and/or s?
There are workarounds (I could use memcpy instead of strcpy etc.) but that would mean that all functions would need further parameters for the lengths of the memory blocks.
The PCREAHK.DLL is also not UTF-8 compatible, mainly because AHK is not. It would be easy to add this, though. It just add another 5 kb or so to the code.
I know that UTF-8 is a means of packing Unicode characters into an 8-bit domain for the purpose of: 1) manipulating them as C strings; 2) saving memory; 3) other uses (?) for which Unicode (16-bit characters) would be unsuitable.The PCREAHK.DLL is also not UTF-8 compatible, mainly because AHK is not.
But I'm a little cloudy on the concept of being "UTF-8 compatible" and what level of effort would be required to achieve it. I did find some info at wikipedia and the following quote from UTF-8 and Unicode FAQ for Unix/Linux:
Please let me know if you have any other good links or info about UTF-8 compatibility.Python got Unicode support added in version 1.6.
Perl offers proper Unicode and UTF-8 support starting with version 5.8. Strings are now tagged in memory as either byte strings or character strings, and the latter are stored internally as UTF-8 but appear to the programmer just as sequences of UCS characters. There is now also comprehensive support for encoding conversion and normalization included. Read “man perluniintro” for details.
See the section "What is UTF-8?"
I personally think UTF-8 is not a pressing matter. But then I do not use many characters outside the Windows charset.
IMHO UTF-8 is more a stop gap. If I were you, I would not bother and go for two different EXEs with full Unicode support. The source would be the same of course: Unicode can be done by simple conditional compilation.Perhaps someday UTF-8 will be integrated as an alternative to the plan of having a separate Unicode version of AutoHotkey. However, if that impacts performance or code size too much, separate versions might be better.
thank you for making such a useful implementation to AHK; having PCRE support has really created possibilities for me in solving more code problems with AHK.
I do have a question, however, about the replace routine. More specifically a Search and Replace ALL routine. Input VAR1, search & replace all wildcard matches, Output to VAR2. I've been futzing with this for a while now and haven't been able to figure this out... Can someone please show me an example???
Thanks,
JGpo
replace.ahk is not meant as the definite solution anyway, more as an example to show how to call the DLL functions.
There is no way how you can use the PCRE library without knowing the internal workings of regexes. One thing to know is that no routine in the PCRE library does a global search and replace. If you want to do this, you must keep track of things yourself by incrementally advancing through the string in question.
What I mean by using a VAR instead of a string, is that I'm using FileRead to have the PCRE routine search through a variable instead of making s:="A string of text that I hard coded"
Well, here's a code exapmle of what I'm trying to accomplish, based upon the PCRE wrapper example called "REPLACE.AHK".... Note: Credits to the person who wrote the .ahk sample, which I have altered below.
This code reads an --ascii -trace file from CURL and finds the first bit of hex (which I am trying to FIND and REPLACE with a \s space.)
RE_Replace(ByRef string,offset,pattern,options,replace) { ; VarSetCapacity(tempres,1024) tempres:=string ; call the DLL function i:=DllCall("pcreahk.dll\pcre_replace","str",tempres,"int",offset,"str",pattern,"int",options,"str",replace) If (i>0) ; at least one replacement was done, copy result back string:=tempres VarSetCapacity(tempres,0) Return i } FileRead, Data_DUMP, %a_scriptdir%\test_DUMP hModule:=DllCall("LoadLibrary","str","pcreahk.dll","UInt") p:="\d...\:" ;basically I'm looking to clean all instances of hex info out of a file first. s:=Data_DUMP ; I want the process to output to the same VAR msgbox %s% o:=0 r:=Data_DUMP ; ;r:="${very simple$}" e:=RE_Replace(s,o,p,0,r) if (e>0) MsgBox result of RE_Replace: %s% Else MsgBox Error: %e%
I get an error in the var results... so not only can I not have it find the first bit of hex, but am far from having a search and replace all... I'm definately foggy on how to write a loop to process such a large amount of data... so I'm totally confused!!!
Thanks for the previous reply and in advance for your much appreciated help!