I was working on a Java library which extends regular expressions and thought to write this function as a small challenge.
Functions:
RegEx_Range and
RegEx_RangeNZ
Description
- Returns a Regular Expression that matches a number in the numeric range, start to end, passed as arguments.
The produced RegEx uses these RegEx syntaxes:
character classes (e.g. [1-3]) and
alternation (e.g. 1|2).
Edit: it no longer uses non-capture groups. This allows the returned regular expression to be used in "All" RegEx engines (even basic ones found in text editors and the like).
Download
RegEx.zip
Requirements
None
Functions
RegEx_RangeNZ(start, end)
Convience function for calling
RegEx_Range(start, end, false).
Mnemonic:
RegEx_RangeNZ (
No leading
Zeros).
RegEx_Range(start, end, allowLeadingZeros = true)
Parameters
Start and
end can be any integer (positive, negative, or zero (including "-0"), in decimal form). Otherwise, the empty string is returned.
Note:
start need not be less than or equal to
end. If
start >
end,
start and
end will be swapped when calling the internal function
RegEx_Range_private.
If start or end is "-0", "-00", etc. then negative zero will match. Otherwise, negative zero won't match (even in the case that start is negative and end is positive).
ReturnValue
A Regular Expression that matches a number in the numeric range,
start to
end.
This function's return can be used as a quasi-boolean value. The statement
if RegEx_Range(...) would be true if no error occurred, and false otherwise (i.e. either
start or
end is not a number).
Note: if
start =
end = 0, then "0" will be returned (a "false" value, even though there is no error).
Remarks
The returned pattern will match a number with digit count between
strLen(start) and
strLen(end).
Note:
strLen(start) need not be less than or equal to
strLen(end).
Code:
;e.g.
RegEx_Range(0000, 1) ;will match a number between 0 and 1 with 1 to 4 digits
RegEx_Range(0000, 01) ;will match a number between 0 and 1 with 2 to 4 digits
;Likewise,
RegEx_Range(0, 0001) ;will match a number between 0 and 1 with 1 to 4 digits
RegEx_Range(00, 0001) ;will match a number between 0 and 1 with 2 to 4 digits
Edit: in the case that
start is negative and
end is positive, the number of digits for a negative number will be from 1 to
strLen(start), and the number of digits for a positive number will be from 1 to
strLen(end). To alter this behavior, you can call the function twice (once for the negative numbers, and once for the positive), specifying leading zeros, as desired.
Code:
;negative numbers that have 0-padding must in quotes (or the leading zeros are lost)
; - this is AHK, not the function
;positive numbers retain the 0-padding even if not in quotes
;matches -5 to 7, including, for example, "-5" and "7"
RegEx_Range := RegEx_Range("-05", "07")
number = -5
if RegExMatch(number, "^(" RegEx_Range ")", match)
MsgBox, % "Match: " Match1
else
MsgBox, % "No match"
Code:
;matches -5 to 7, including, for example, "7" (but not "-5")
RegEx_Range := RegEx_Range("-05", "-01") "|" RegEx_Range("0", "07")
number := "-5"
if RegExMatch(number, "^(" RegEx_Range ")", match)
MsgBox, % "Match: " Match1
else
MsgBox, % "No match"
Edit: The returned pattern is no longer inside non-capture group "(?:RegEx)". This is because some RegEx engines (such as ones found in text editors and the like), do not support non-capture groups. Instead, the pattern is the "raw" form (the RegEx inside the non-capture group in the original version). For example, RegEx_Range(1, 12) returns "1[0-2]|0?[1-9]".
The returned regular expression
may not work as expected if inserted directly into a larger pattern.
For example, to verify that the number 1-12 is not part of a larger number, you might use this pattern.
Code:
;verifies a digit doesn't lie to the left or right of the match
RegEx := "(?<!\d)" RegEx_Range(1, 12) "(?!\d)"
if pos := RegExMatch("123 312 12", RegEx, match)
{
;pos = 1 - not what was expected
;the range should be placed in a capture or non-capture group
;to isolate it from the rest of the pattern
MsgBox, % pos
}
else
MsgBox, % "No match"
However, the above pattern doesn't work as expected because the alternation used in the returned range. However, by simply surrounding the range in a capture group (named or unnamed), or a non-capture group, then the result is as expected.
Code:
RegEx := "(?<!\d)(" RegEx_Range(1, 12) ")(?!\d)"
if pos := RegExMatch("123 312 12", RegEx, match)
{
;pos = 9 (the third "12")
MsgBox, % pos
}
else
MsgBox, % "No match"
The below pattern to check if a number is between 1 and 12 or 17 and 25 works fine (because it uses alternation, which won't cause conflicts).
Code:
;matches a number between 1 and 12 or 17 and 25
RegEx := RegEx_Range(17, 25) "|" RegEx_Range(1, 12)
number = 17
if RegExMatch(number, RegEx, match)
MsgBox, % match
else
MsgBox, % "No match"
Note: you
must have the larger range first. If you don't, it could lead to side-effects.
Code:
;matches a number between 1 and 12 or 17 and 25
RegEx := RegEx_Range(1, 12) "|" RegEx_Range(17, 25)
number = 17
if RegExMatch(number, RegEx, match)
{
;matched the "1" in "17"
;(alternations stop on the first match)
MsgBox, % match
}
else
MsgBox, % "No match"
CodeCode:
/*
Miscellaneous functions - not directly part of library, but required
They can be moved to a separate file, or kept here.
I have them in my library, as I use them in different projects
*/
;repeats the specified string, <count> times
;if count <= 0, the empty string is returned
repeat(str, count)
{
Loop, %count%
{
result .= str
}
return result
}
;loops from start to stop (inclusive) with the given step
;http://www.autohotkey.com/forum/viewtopic.php?t=42553
for(ByRef LoopVariable, start, stop, ByRef step)
{
if (!step)
step := (start <= stop ? 1 : -1)
LoopVariable := start
return floor((stop - start) / step) + 1
}
/*
A library of functions used when dealing with Regular Expressions
*/
;convience function for calling RegEx_Range(start, end, false)
;RegEx_RangeNZ (No leading Zeros)
RegEx_RangeNZ(start, end)
{
return RegEx_Range(start, end, false)
}
RegEx_Range(start, end, allowLeadingZeros = true)
{
if start is not integer
return
else if end is not integer
return
;store the current format
oldIntFormat := A_FormatInteger
;ensures that the numbers are decimal numbers
SetFormat, integer, dec
if (start > end)
{
tmp := start
start := end
end := tmp
}
negStart := subStr(start, 1, 1) == "-"
negEnd := subStr(end, 1, 1) == "-"
if (negStart) {
if (negEnd) {
;both start and end are negative
;e.g. -170 to -16 -> 16 to 170 (with a leading negative sign)
StringTrimLeft, negS, end, 1
StringTrimLeft, negE, start, 1
} else {
;start is negative and end is non-negative
negS := start = 0 ? "0" : "1"
StringTrimLeft, negE, start, 1
posS := "0"
posE := end
}
} else {
;both are non-negative
if (negEnd) {
;special case
;start = 0 and end = -0
negS := "0"
negE := "0"
posS := "0"
posE := "0"
} else {
posS := start
posE := end
}
}
if strLen(negS) != 0 {
;hex numbers are not yet supported
if negS is not digit
return
if negE is not digit
return
result .= RegEx_range_private("-", negS, negE, allowLeadingZeros)
}
if strLen(posS) != 0 {
;hex numbers are not yet supported
if posS is not digit
return
if posE is not digit
return
if strLen(result) != 0
result .= "|"
result .= RegEx_range_private("", posS, posE, allowLeadingZeros)
}
;restore the current format
SetFormat, integer, %oldIntFormat%
return result
}
RegEx_range_private(lead, start, end, allowLeadingZeros)
{
if (allowLeadingZeros)
return RegEx_rangeZ_private(lead, start, end)
;remove leading zeros
start += 0
end += 0
if strLen(start) == strLen(end)
return RegEx_rangeZ_private(lead, start, end)
Loop, % for(i, strLen(start), strLen(end), i_step := 1)
{
if (i == strLen(start))
{
tmpStart := start
tmpEnd := repeat("9", i)
}
else if (i == strLen(end))
{
tmpStart := "1" . repeat("0", i - 1)
tmpEnd := end
}
else
{
tmpStart := "1" . repeat("0", i - 1)
tmpEnd := repeat("9", i)
}
if strLen(result) != 0
result := "|" result
result := RegEx_rangeZ_private(lead, tmpStart, tmpEnd) . result
i += i_step
}
return result
}
RegEx_rangeZ_private(lead, start, end)
{
if strLen(start) == 1 && strLen(end) == 1 {
digit1 := asc(subStr(start, 1, 1)) - asc("0")
digit2 := asc(subStr(end, 1, 1)) - asc("0")
if (digit1 == digit2)
return lead . start
else if (digit1 == digit2 - 1)
return lead "[" start . end "]"
else
return lead "[" start "-" end "]"
} else {
;optZero - whether a leading zero is optional
;digit1 - the first digit in <start>
;digit2 - the first digit in <end>
if strLen(start) < strLen(end) {
;optional zero (e.g. 1-17)
digit1 := 0
digit2 := asc(subStr(end, 1, 1)) - asc("0")
newStart := start
StringTrimLeft, newEnd, end, 1
optZero := true
} else if strLen(end) < strLen(start) {
;optional zero (e.g. 01-7)
digit1 := asc(subStr(start, 1, 1)) - asc("0")
digit2 := 0
StringTrimLeft, newStart, start, 1
newEnd := end
optZero := true
} else {
digit1 := asc(subStr(start, 1, 1)) - asc("0")
digit2 := asc(subStr(end, 1, 1)) - asc("0")
StringTrimLeft, newStart, start, 1
StringTrimLeft, newEnd, end, 1
optZero := false
}
if (digit1 == digit2) {
newLead := lead . digit1 . (optZero ? "?" : "")
return RegEx_rangeZ_private(newLead, newStart, newEnd)
}
if (newStart != repeat("0", strLen(newStart))) {
newLead := lead . digit1
if (optZero && digit1 == 0)
newLead .= "?"
result := RegEx_rangeZ_private(newLead, newStart
, repeat("9", strLen(newEnd)))
digit1++
}
if (newEnd != repeat("9", strLen(newEnd))) {
needSecondGroup := true
digit2--
} else
needSecondGroup := false
if (digit1 <= digit2) {
if strLen(result) != 0
result := "|" result
newLead := lead
if (digit1 == digit2)
newLead .= digit1
else if (digit1 == digit2 - 1)
newLead .= "[" digit1 . digit2 "]"
else
newLead .= "[" digit1 "-" digit2 "]"
if (optZero && digit1 == 0) {
newLead .= "?"
useLength := strLen(newStart)
} else
useLength := strLen(newEnd)
result := RegEx_rangeZ_private(newLead, repeat("0", useLength)
, repeat("9", strLen(newEnd)))
. result
}
if (needSecondGroup) {
if strLen(result) != 0
result := "|" result
newLead := lead . (digit2 + 1)
if (optZero && digit2 + 1 == 0)
newLead .= "?"
result := RegEx_rangeZ_private(newLead
, repeat("0", strLen(newEnd)), newEnd)
. result
}
return result
}
}
Example 1 (matching an IPv4 address)
Code:
;stored for simplicity and readability
byte := RegEx_Range(0, 255)
;matches an IPv4 address
;You can capture the number by placing the returned pattern in a capture group
IPv4 := "^(?<byte1>" byte ")\.(?<byte2>" byte ")\.(?<byte3>" byte ")\.(?<byte4>" byte ")$"
IP := "192.0.01.25"
if RegExMatch(IP, IPv4, match)
MsgBox, % matchByte1 "." matchByte2 "." matchByte3 "." matchByte4
else
MsgBox, % IP " is not a valid IPv4 address."
Example 2 (matching a time stamp)
Code:
;matches a time with form:
;Hour:Minute:Second AM/PM
;(Seconds and AM/PM marker are optional)
time := "(?<hour>" RegEx_Range(0, 23) "):(?<min>" RegEx_Range(0, 59) ")"
. "(?::(?<sec>" RegEx_Range(0, 59) "))?(?:\s*+(?<marker>(?i)[AP]M))?"
FormatTime, currentTime,, h:mm:ss tt
msg := "The current time is: " currentTime
MsgBox, % msg
if RegExMatch(msg, time, match)
{
MsgBox, % "Hour: " matchHour "`n"
. "Minute: " matchMin "`n"
. "Second: " matchSec "`n"
. "AM/PM: " matchMarker
}
else
MsgBox, % "No match."
Example 3 (General test for accuracy)
Code:
/*
A test script for the RegEx_Range function
*/
;the start and end values for the range
start := -5
end := 199
;if using RegEx_RangeNZ, digitCounts should be zero - since leading zeros won't match
;number of digits in the result (padded with leading zeros, if necessary)
;should be between strLen(start) and strLen(end)
digitCount := 3
;digit count for negative numbers
digitCountN := 1
;the Regular Expression that matches the range
if !RegEx_Range := RegEx_Range(start, end)
{
MsgBox, % "Either start or end was not a number. Exiting..."
ExitApp
}
; MsgBox, % RegEx_Range
;zeros (used for padding)
zeros := repeat("0", digitCount)
itWorks := true
;loops through each possible value
Loop, % for(i, start, end, i_step := 0)
{
if (i < 0 && strLen(abs(i)) < digitCountN)
{
;if number of digits in <i> is less than digitCount,
;pad with leading zeros
number := "-" subStr(zeros . abs(i), 1 - digitCountN)
}
else if (i > 0 && strLen(i) < digitCount)
{
;if number of digits in <i> is less than digitCount,
;pad with leading zeros
number := subStr(zeros . i, 1 - digitCount)
}
else
number := i
if RegExMatch(number, "(" RegEx_Range ")", match) != 1 || match1 != number
{
;if not a match (something went wrong)
itWorks := false
MsgBox, 4, % "Continue?", % "Match was " match
. ".`nShould have matched " number ".`nContinue?"
IfMsgBox, No
break
}
i += i_step
}
if (itWorks)
MsgBox, % "It works!"
else
MsgBox, % "It didn't work :("
max(value1, value2)
{
return (value1 >= value2 ? value1 : value2)
}
How to use
Extract the zip's contents to a
library folder for automatic inclusion - StdLib compliant.
A copy of the above examples can be found in the "Func Examples" folder.
Download RegEx functions