[EDIT] This is now official! v1.0.45 released: Regular Expressions (RegEx)
I won't teach here what they are, there are numerous tutorials to be found on the Web, including mine: Regular Expressions: a simple, easy tutorial. (Shameless plug ;-))
Mastering REs isn't easy, so I thought I should collect a number of regexes that I made while answering some questions here, along with some explanations to help people grasping this language.
So this topic will grow each time I find one real world solution solved with RegExp. Perhaps Chris will promote this topic to sticky status...
You can put your solutions here, but I advise not to post here requests for REs, only answers.
I created another topic where you could post such requests: Put here requests of problems with regular expressions. Answers will been given there, and if generic enough, copied here.
Perhaps the topic should move to the Wiki some day, it would make it easier to merge solutions, group them by topics, etc.
Check out also the Regular Expression Library for a comprehensive list of such expressions.
Some advices first:
Regular expressions can look unreadable and complex. In fact, they are... ;-)
Not! Well, most expressions you will use are quite simple, actually.
I advise to read a good tutorial from start to end, even if you don't understand or memorize everything. This way, you will get at least a general understanding of the capabilities of REs, of the range of what they can do.
Then play with REs, with AutoHotkey's RegExMatch and RegExReplace commands, or some of the tools that shows interactively results of matching, like the excellent The Regex Coach (which don't use the PCRE engine, so results might differ slightly) or PCRE Workbench (not bad but handle only single lines) or even REGex TESTER which is online but uses Ajax to show results almost in real time (select preg to use the PCRE engine). Some other similar tools are mentioned in Regex Match Tracer topic.
Someday such tool might be written in AutoHotkey...
Starts with simple examples, real problems you need to solve, etc.
Soon, you will be familiar with the syntax, and find you need more advanced capabilities. You can then re-read the tutorial and/or the full documentation, to find that most of it is much easier to understand...
The above is also true for most programming languages! :-)
Also remember that REs are just a tool in the toolbox of the programmer. Indeed a very powerful tool, but one among others.
It is like a golden hammer, so nice that you want to do everything with it, to fasten screws, to cut glass, etc. ;-)
There a things REs cannot do (iterations, computing, ...), others they can do with great complexity (matching a valid date, including leap years) while it is easier to add some code around a simpler RE to do this, and lot of things they easily do!
RegExp library/collection
Let's start by some requests made by Goyyah (or solutions inspired by his needs).
File name and path
R: Ensure that a path doesn't end with a backslash.
A: path := RegExReplace(path, "\\$")
It replaces a terminal backslash, if any, by nothing.
R: Ensure that a path ends with a backslash.
A: path := RegExReplace(path, "([^\\])$", "\1\\")
If the path ends by anything except a backslash, replace this char by itself and a backslash.
R: Replace characters illegal in a file name by something else (can be empty).
A: fileName := RegExReplace(fileName, "[/\\]", substitute)
Note that Microsoft replaces these characters with a X. Why not?
Note also I had to double the double quote to follow AHK' syntax of expression strings.
File parsing
Two ways to transform the content of a file with regular expressions:
- If the file isn't too big, say less than 10% of the size of the physical memory (Ram) of your computer, you can FileRead it at once, apply one or several RegExReplace commands, then write the result on disk with FileAppend on a temporary or definitive file. If temporary, you can write back to the original file with a FileMove command with overwrite option.
- If the file is really big, you can use Loop Read along with its OutputFile option to read the file line per line, apply the transformations and FileAppend the resulting lines to the destination file.
Using the MULTILINE option m) on the first case can help, and you might need to change the end of line as seen by the engine.
R: Change format of file. Somebody complained that the list of Windows messages made by Chris wasn't practical: for alignment reasons, it shown the hexa code, a tab then the message name, while one wanted a WM_MESS = 0xBEEF format.
A: newFormat := RegExReplace(line, "i)0x([\da-f]+)\t(\w+)", "$2 = 0x$1")
That's only one possible answer, assuming a line per line change. I could have made it more generic: "^(.+)\t(.+)$", "$2 = $1".
R: Put the R: and A: prefixes of this message in bold and blue. (Did that with my text editor, but that's the same, more or less)
A: message := RegExReplace(message, "^([AR]: )", "[ color=darkblue][ b]$1:[/b ][/color ] ")
R: Keep only lines starting with a given string.
A: result := RegExReplace(fileContent, "m)^(?!" . linePrefix . ")[^\r\n]*(\r?\n|\Z)")
m) isn't part of the RE but is AHK's way to activate the multiline option, so ^ and $ work line per line instead of matching only the beginning and end of the string.
\r?\n match Unix (\n only) or Windows (\r\n) end-of-line (EOL). So (\r?\n|\Z) means: match any end-of-line symbol OR the end of the string, in case the file doesn't end with a newline.
Chris chose to make \r\n the default end-of-line symbol, so if the file uses Unix EOLs, you have to add the `n option: "m`n)^(..."
(?!foo) is a quite advanced concept of RE. It is a "negative look ahead assertion"... Regexes can easy match a char or a string, can easily match a char that isn't one of those given, but basically it is difficult to match a string that is different of the one given.
The look ahead and look behind (or lookahead and lookbehind as written in PCRE doc.) assertions are here for that, among other things.
(?!foo)[^\r\n]* match any line not starting with 'foo', the class meaning "any non-EOL char".
So we remove all lines not matching the given string. Beware of special chars in linePrefix!
Other examples
R: Split a string in fixed parts of variable width.
Example, the AHK format for dates: YYYYMMDD
A: date := RegExReplace(date, "(\d{4})(\d{2})(\d{2})", "$3/$2/$1")
With named captures:
A: date := RegExReplace(date, "(?P
If date = 20061021 before, it will be 21/10/2006 in both cases.
Update: Chris introduced the use of named captures in RegExMatch. So to get 3 variables holding the year, month and day, you just have to write:
str = 2006/11/06
pos := RegExMatch(str, "(?P
and you get the values in rYear, rMonth and rDay! Very convenient.
Some things hard to impossible to do in RegExes
If you prove me wrong, or find better ways, items here can move above ;-)
R: Verify that given words are all in a string.
A: Easy if they must be in a given order, harder if they can be in any order.
For example, to test if on, off, toggle are all in a string, one can use such expression:
"(on.*off.*toggle|on.*toggle.*off|off.*on.*toggle|off.*toggle.*on|toggle.*on.*off|toggle.*off.*on)"
With more items, the expression becomes very big... A simple loop with InStr() will be probably better.
Update: I made tests to compare methods to check if only two strings are in a line. The above method is the worst in terms of performance, it is better to separate in two regexes, or even to use InStr().