RegExMatch() [v1.0.45+]

Determines whether a string contains a pattern (regular expression).

FoundPos := RegExMatch(Haystack, NeedleRegEx , OutputVar, StartingPos)

Parameters

Haystack

The string whose content is searched.

NeedleRegEx

The pattern to search for, which is a Perl-compatible regular expression (PCRE). The pattern's options (if any) must be included at the beginning of the string followed by a close-parenthesis. For example, the pattern i)abc.*123 would turn on the case-insensitive option and search for "abc", followed by zero or more occurrences of any character, followed by "123". If there are no options, the ")" is optional; for example, )abc is equivalent to abc.

OutputVar

If omitted, no output variable will be used. Otherwise, specify an output variable in which to store specific data, depending on which of the following modes is used.

Mode 1 (default): Specify an output variable in which to store the part of Haystack that matched the entire pattern. If the pattern is not found (that is, if the function returns 0), this variable and all array elements below are made blank.

If any capturing subpatterns are present inside NeedleRegEx, their matches are stored in a pseudo-array whose base name is OutputVar. For example, if the variable's name is Match, the substring that matches the first subpattern would be stored in Match1, the second would be stored in Match2, and so on. The exception to this is named subpatterns: they are stored by name instead of number. For example, the substring that matches the named subpattern (?P<Year>\d{4}) would be stored in MatchYear. If a particular subpattern does not match anything (or if the function returns zero), the corresponding variable is made blank.

Within a function, to create a pseudo-array that is global instead of local, declare the base name of the pseudo-array (e.g. Match) as a global variable prior to using it. The converse is true for assume-global functions. However, it is often also necessary to declare each element, due to a common source of confusion.

Mode 2 (position-and-length): If a capital P is present in the RegEx's options -- such as P)abc.*123 -- the length of the entire-pattern match is stored in OutputVar (or 0 if no match). If any capturing subpatterns are present, their positions and lengths are stored in two pseudo-arrays: OutputVarPos and OutputVarLen. For example, if the variable's base name is Match, the one-based position of the first subpattern's match would be stored in MatchPos1, and its length in MatchLen1 (zero is stored in both if the subpattern was not matched or the function returns 0). The exception to this is named subpatterns: they are stored by name instead of number (e.g. MatchPosYear and MatchLenYear).

Mode 3 (match object) [v1.1.05+]: If a capital O is present in the RegEx's options -- such as O)abc.*123 -- a match object is stored in OutputVar. This object can be used to retrieve the position, length and value of the overall match and of each captured subpattern, if present.

StartingPos

If omitted, it defaults to 1 (the beginning of Haystack). Otherwise, specify 2 to start at the second character, 3 to start at the third, and so on. If StartingPos is beyond the length of Haystack, the search starts at the empty string that lies at the end of Haystack (which typically results in no match).

If StartingPos is less than 1, it is considered to be an offset from the end of Haystack. For example, 0 starts at the last character and -1 starts at the next-to-last character. If StartingPos tries to go beyond the left end of Haystack, all of Haystack is searched.

Regardless of the value of StartingPos, the return value is always relative to the first character of Haystack. For example, the position of "abc" in "123abc789" is always 4.

Return Value

This function returns the position of the leftmost occurrence of NeedleRegEx in the string Haystack. Position 1 is the first character. Zero is returned if the pattern is not found. If an error occurs (such as a syntax error inside NeedleRegEx), an empty string is returned and ErrorLevel is set to one of the values below instead of 0.

Error Handling

[v1.1.04+]: This function is able to throw an exception on failure (not to be confused with "no match found"). For more information, see Runtime Errors.

ErrorLevel is set to one of the following:

Options

See RegEx Quick Reference for options such as i)abc, which turns off case-sensitivity.

Match Object [v1.1.05+]

If a capital O is present in the RegEx's options, a match object is stored in OutputVar. This object has the following methods and properties:

Match.Pos(N): Returns the position of the overall match or a captured subpattern.

Match.Len(N): Returns the length of the overall match or a captured subpattern.

Match.Value(N): Returns the overall match or a captured subpattern.

Match.Name(N): Returns the name of the given subpattern, if it has one.

Match.Count(): Returns the overall number of subpatterns.

Match.Mark(): Returns the NAME of the last encountered (*MARK:NAME), when applicable.

Match[N]: If N is 0 or a valid subpattern number or name, this is equivalent to Match.Value(N). Otherwise, N can be the name of one of the above methods. For example, Match["Pos"] and Match.Pos are equivalent to Match.Pos() unless a subpattern named "Pos" exists, in which case they are equivalent to Match.Value("Pos").

Match.N: Same as above, except that N is an unquoted name or number.

For all of the above methods and properties, N can be any of the following:

Brackets [] may be used in place of parentheses () if N is specified.

Performance

To search for a simple substring inside a larger string, use InStr() because it is faster than RegExMatch().

To improve performance, the 100 most recently used regular expressions are kept cached in memory (in compiled form).

The study option (S) can sometimes improve the performance of a regular expression that is used many times (such as in a loop).

Remarks

A subpattern may be given a name such as the word Year in the pattern (?P<Year>\d{4}). Such names may consist of up to 32 alphanumeric characters and underscores. The following limitation does not apply to the "O" (match object) mode: Although named subpatterns are also available by their numbers during the RegEx operation itself (e.g. \1 is a backreference to the string that actually matched the first capturing subpattern), they are stored in the output pseudo-array only by name (not by number). For example, if "Year" is the first subpattern, OutputVarYear would be set to the matching substring, but OutputVar1 would not be changed at all (it would retain its previous value, if any). However, if an unnamed subpattern occurs after "Year", it would be stored in OutputVar2, not OutputVar1.

Most characters like abc123 can be used literally inside a regular expression. However, any of the characters in the set \.*?+[{|()^$ must be preceded by a backslash to be seen as literal. For example, \. is a literal period and \\ is a literal backslash. Escaping can be avoided by using \Q...\E. For example: \QLiteral Text\E.

Within a regular expression, special characters such as tab and newline can be escaped with either an accent (`) or a backslash (\). For example, `t is the same as \t except when the x option is used.

To learn the basics of regular expressions (or refresh your memory of pattern syntax), see the RegEx Quick Reference.

AutoHotkey's regular expressions are implemented using Perl-compatible Regular Expressions (PCRE) from www.pcre.org.

[AHK_L 31+]: Within an expression, a ~= b can be used as shorthand for RegExMatch(a, b).

RegExReplace(), RegEx Quick Reference, Regular Expression Callouts, InStr(), IfInString, StringGetPos, SubStr(), SetTitleMatchMode RegEx, Global matching and Grep (forum link)

Common sources of text data: FileRead, UrlDownloadToFile, Clipboard, GUI Edit controls

Examples

For general RegEx examples, see the RegEx Quick Reference.

Reports 4, which is the position where the match was found.

MsgBox % RegExMatch("xxxabc123xyz", "abc.*xyz")

Reports 7 because the $ requires the match to be at the end.

MsgBox % RegExMatch("abc123123", "123$")

Reports 1 because a match was achieved via the case-insensitive option.

MsgBox % RegExMatch("abc123", "i)^ABC")

Reports 1 and stores "XYZ" in SubPat1.

MsgBox % RegExMatch("abcXYZ123", "abc(.*)123", SubPat)

Reports 7 instead of 1 due to the starting position 2 instead of 1.

MsgBox % RegExMatch("abc123abc456", "abc\d+",, 2)

Demonstrates the usage of the Match object.

FoundPos := RegExMatch("Michiganroad 72", "O)(.*) (?<nr>\d+)", SubPat)  ; The starting "O)" turns SubPat into an object.
Msgbox % SubPat.Count() ": " SubPat.Value(1) " " SubPat.Name(2) "=" SubPat["nr"]  ; Displays "2: Michiganroad nr=72"

Retrieves the extension of a file. Note that SplitPath can also be used for this, which is more reliable.

Path := "C:\Foo\Bar\Baz.txt"
RegExMatch(Path, "\w+$", Extension)
MsgBox % Extension  ; Reports "txt".

Similar to Transform Deref, the following function expands variable references and escape sequences contained inside other variables. Furthermore, this example shows how to find all matches in a string rather than stopping at the first match (similar to the g flag in JavaScript's RegEx).

var1 := "abc"
var2 := 123
MsgBox % Deref("%var1%def%var2%")  ; Reports abcdef123.

Deref(String)
{
    spo := 1
    out := ""
    while (fpo:=RegexMatch(String, "(%(.*?)%)|``(.)", m, spo))
    {
        out .= SubStr(String, spo, fpo-spo)
        spo := fpo + StrLen(m)
        if (m1)
            out .= %m2%
        else switch (m3)
        {
            case "a": out .= "`a"
            case "b": out .= "`b"
            case "f": out .= "`f"
            case "n": out .= "`n"
            case "r": out .= "`r"
            case "t": out .= "`t"
            case "v": out .= "`v"
            default: out .= m3
        }
    }
    return out SubStr(String, spo)
}