 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sat Jun 20, 2009 7:26 am Post subject: [Func] RegEx_Range-returns a RegEx to match a range of nums |
|
|
I was working on a Java library which extends regular expressions and thought to write this function as a small challenge.
Functions: RegEx_Range and RegEx_RangeNZ
Description
- Returns a Regular Expression that matches a number in the numeric range, start to end, passed as arguments.
The produced RegEx uses these RegEx syntaxes: character classes (e.g. [1-3]) and alternation (e.g. 1|2). Edit: it no longer uses non-capture groups. This allows the returned regular expression to be used in "All" RegEx engines (even basic ones found in text editors and the like).
Download
RegEx.zip
Requirements
None
Functions
RegEx_RangeNZ(start, end)
Convience function for calling RegEx_Range(start, end, false).
Mnemonic: RegEx_RangeNZ (No leading Zeros).
RegEx_Range(start, end, allowLeadingZeros = true)
Parameters
Start and end can be any integer (positive, negative, or zero (including "-0"), in decimal form). Otherwise, the empty string is returned.
Note: start need not be less than or equal to end. If start > end, start and end will be swapped when calling the internal function RegEx_Range_private.
If start or end is "-0", "-00", etc. then negative zero will match. Otherwise, negative zero won't match (even in the case that start is negative and end is positive).
ReturnValue
A Regular Expression that matches a number in the numeric range, start to end.
This function's return can be used as a quasi-boolean value. The statement if RegEx_Range(...) would be true if no error occurred, and false otherwise (i.e. either start or end is not a number).
Note: if start = end = 0, then "0" will be returned (a "false" value, even though there is no error).
Remarks
The returned pattern will match a number with digit count between strLen(start) and strLen(end).
Note: strLen(start) need not be less than or equal to strLen(end).
| Code: | ;e.g.
RegEx_Range(0000, 1) ;will match a number between 0 and 1 with 1 to 4 digits
RegEx_Range(0000, 01) ;will match a number between 0 and 1 with 2 to 4 digits
;Likewise,
RegEx_Range(0, 0001) ;will match a number between 0 and 1 with 1 to 4 digits
RegEx_Range(00, 0001) ;will match a number between 0 and 1 with 2 to 4 digits |
Edit: in the case that start is negative and end is positive, the number of digits for a negative number will be from 1 to strLen(start), and the number of digits for a positive number will be from 1 to strLen(end). To alter this behavior, you can call the function twice (once for the negative numbers, and once for the positive), specifying leading zeros, as desired.
| Code: | ;negative numbers that have 0-padding must in quotes (or the leading zeros are lost)
; - this is AHK, not the function
;positive numbers retain the 0-padding even if not in quotes
;matches -5 to 7, including, for example, "-5" and "7"
RegEx_Range := RegEx_Range("-05", "07")
number = -5
if RegExMatch(number, "^(" RegEx_Range ")", match)
MsgBox, % "Match: " Match1
else
MsgBox, % "No match" |
| Code: | ;matches -5 to 7, including, for example, "7" (but not "-5")
RegEx_Range := RegEx_Range("-05", "-01") "|" RegEx_Range("0", "07")
number := "-5"
if RegExMatch(number, "^(" RegEx_Range ")", match)
MsgBox, % "Match: " Match1
else
MsgBox, % "No match" |
Edit: The returned pattern is no longer inside non-capture group "(?:RegEx)". This is because some RegEx engines (such as ones found in text editors and the like), do not support non-capture groups. Instead, the pattern is the "raw" form (the RegEx inside the non-capture group in the original version). For example, RegEx_Range(1, 12) returns "1[0-2]|0?[1-9]".
The returned regular expression may not work as expected if inserted directly into a larger pattern.
For example, to verify that the number 1-12 is not part of a larger number, you might use this pattern.
| Code: | ;verifies a digit doesn't lie to the left or right of the match
RegEx := "(?<!\d)" RegEx_Range(1, 12) "(?!\d)"
if pos := RegExMatch("123 312 12", RegEx, match)
{
;pos = 1 - not what was expected
;the range should be placed in a capture or non-capture group
;to isolate it from the rest of the pattern
MsgBox, % pos
}
else
MsgBox, % "No match" |
However, the above pattern doesn't work as expected because the alternation used in the returned range. However, by simply surrounding the range in a capture group (named or unnamed), or a non-capture group, then the result is as expected.
| Code: | RegEx := "(?<!\d)(" RegEx_Range(1, 12) ")(?!\d)"
if pos := RegExMatch("123 312 12", RegEx, match)
{
;pos = 9 (the third "12")
MsgBox, % pos
}
else
MsgBox, % "No match" |
The below pattern to check if a number is between 1 and 12 or 17 and 25 works fine (because it uses alternation, which won't cause conflicts).
| Code: | ;matches a number between 1 and 12 or 17 and 25
RegEx := RegEx_Range(17, 25) "|" RegEx_Range(1, 12)
number = 17
if RegExMatch(number, RegEx, match)
MsgBox, % match
else
MsgBox, % "No match" |
Note: you must have the larger range first. If you don't, it could lead to side-effects.
| Code: | ;matches a number between 1 and 12 or 17 and 25
RegEx := RegEx_Range(1, 12) "|" RegEx_Range(17, 25)
number = 17
if RegExMatch(number, RegEx, match)
{
;matched the "1" in "17"
;(alternations stop on the first match)
MsgBox, % match
}
else
MsgBox, % "No match" |
Code
| Code: | /*
Miscellaneous functions - not directly part of library, but required
They can be moved to a separate file, or kept here.
I have them in my library, as I use them in different projects
*/
;repeats the specified string, <count> times
;if count <= 0, the empty string is returned
repeat(str, count)
{
Loop, %count%
{
result .= str
}
return result
}
;loops from start to stop (inclusive) with the given step
;http://www.autohotkey.com/forum/viewtopic.php?t=42553
for(ByRef LoopVariable, start, stop, ByRef step)
{
if (!step)
step := (start <= stop ? 1 : -1)
LoopVariable := start
return floor((stop - start) / step) + 1
}
/*
A library of functions used when dealing with Regular Expressions
*/
;convience function for calling RegEx_Range(start, end, false)
;RegEx_RangeNZ (No leading Zeros)
RegEx_RangeNZ(start, end)
{
return RegEx_Range(start, end, false)
}
RegEx_Range(start, end, allowLeadingZeros = true)
{
if start is not integer
return
else if end is not integer
return
;store the current format
oldIntFormat := A_FormatInteger
;ensures that the numbers are decimal numbers
SetFormat, integer, dec
if (start > end)
{
tmp := start
start := end
end := tmp
}
negStart := subStr(start, 1, 1) == "-"
negEnd := subStr(end, 1, 1) == "-"
if (negStart) {
if (negEnd) {
;both start and end are negative
;e.g. -170 to -16 -> 16 to 170 (with a leading negative sign)
StringTrimLeft, negS, end, 1
StringTrimLeft, negE, start, 1
} else {
;start is negative and end is non-negative
negS := start = 0 ? "0" : "1"
StringTrimLeft, negE, start, 1
posS := "0"
posE := end
}
} else {
;both are non-negative
if (negEnd) {
;special case
;start = 0 and end = -0
negS := "0"
negE := "0"
posS := "0"
posE := "0"
} else {
posS := start
posE := end
}
}
if strLen(negS) != 0 {
;hex numbers are not yet supported
if negS is not digit
return
if negE is not digit
return
result .= RegEx_range_private("-", negS, negE, allowLeadingZeros)
}
if strLen(posS) != 0 {
;hex numbers are not yet supported
if posS is not digit
return
if posE is not digit
return
if strLen(result) != 0
result .= "|"
result .= RegEx_range_private("", posS, posE, allowLeadingZeros)
}
;restore the current format
SetFormat, integer, %oldIntFormat%
return result
}
RegEx_range_private(lead, start, end, allowLeadingZeros)
{
if (allowLeadingZeros)
return RegEx_rangeZ_private(lead, start, end)
;remove leading zeros
start += 0
end += 0
if strLen(start) == strLen(end)
return RegEx_rangeZ_private(lead, start, end)
Loop, % for(i, strLen(start), strLen(end), i_step := 1)
{
if (i == strLen(start))
{
tmpStart := start
tmpEnd := repeat("9", i)
}
else if (i == strLen(end))
{
tmpStart := "1" . repeat("0", i - 1)
tmpEnd := end
}
else
{
tmpStart := "1" . repeat("0", i - 1)
tmpEnd := repeat("9", i)
}
if strLen(result) != 0
result := "|" result
result := RegEx_rangeZ_private(lead, tmpStart, tmpEnd) . result
i += i_step
}
return result
}
RegEx_rangeZ_private(lead, start, end)
{
if strLen(start) == 1 && strLen(end) == 1 {
digit1 := asc(subStr(start, 1, 1)) - asc("0")
digit2 := asc(subStr(end, 1, 1)) - asc("0")
if (digit1 == digit2)
return lead . start
else if (digit1 == digit2 - 1)
return lead "[" start . end "]"
else
return lead "[" start "-" end "]"
} else {
;optZero - whether a leading zero is optional
;digit1 - the first digit in <start>
;digit2 - the first digit in <end>
if strLen(start) < strLen(end) {
;optional zero (e.g. 1-17)
digit1 := 0
digit2 := asc(subStr(end, 1, 1)) - asc("0")
newStart := start
StringTrimLeft, newEnd, end, 1
optZero := true
} else if strLen(end) < strLen(start) {
;optional zero (e.g. 01-7)
digit1 := asc(subStr(start, 1, 1)) - asc("0")
digit2 := 0
StringTrimLeft, newStart, start, 1
newEnd := end
optZero := true
} else {
digit1 := asc(subStr(start, 1, 1)) - asc("0")
digit2 := asc(subStr(end, 1, 1)) - asc("0")
StringTrimLeft, newStart, start, 1
StringTrimLeft, newEnd, end, 1
optZero := false
}
if (digit1 == digit2) {
newLead := lead . digit1 . (optZero ? "?" : "")
return RegEx_rangeZ_private(newLead, newStart, newEnd)
}
if (newStart != repeat("0", strLen(newStart))) {
newLead := lead . digit1
if (optZero && digit1 == 0)
newLead .= "?"
result := RegEx_rangeZ_private(newLead, newStart
, repeat("9", strLen(newEnd)))
digit1++
}
if (newEnd != repeat("9", strLen(newEnd))) {
needSecondGroup := true
digit2--
} else
needSecondGroup := false
if (digit1 <= digit2) {
if strLen(result) != 0
result := "|" result
newLead := lead
if (digit1 == digit2)
newLead .= digit1
else if (digit1 == digit2 - 1)
newLead .= "[" digit1 . digit2 "]"
else
newLead .= "[" digit1 "-" digit2 "]"
if (optZero && digit1 == 0) {
newLead .= "?"
useLength := strLen(newStart)
} else
useLength := strLen(newEnd)
result := RegEx_rangeZ_private(newLead, repeat("0", useLength)
, repeat("9", strLen(newEnd)))
. result
}
if (needSecondGroup) {
if strLen(result) != 0
result := "|" result
newLead := lead . (digit2 + 1)
if (optZero && digit2 + 1 == 0)
newLead .= "?"
result := RegEx_rangeZ_private(newLead
, repeat("0", strLen(newEnd)), newEnd)
. result
}
return result
}
} |
Example 1 (matching an IPv4 address)
| Code: | ;stored for simplicity and readability
byte := RegEx_Range(0, 255)
;matches an IPv4 address
;You can capture the number by placing the returned pattern in a capture group
IPv4 := "^(?<byte1>" byte ")\.(?<byte2>" byte ")\.(?<byte3>" byte ")\.(?<byte4>" byte ")$"
IP := "192.0.01.25"
if RegExMatch(IP, IPv4, match)
MsgBox, % matchByte1 "." matchByte2 "." matchByte3 "." matchByte4
else
MsgBox, % IP " is not a valid IPv4 address." |
Example 2 (matching a time stamp)
| Code: | ;matches a time with form:
;Hour:Minute:Second AM/PM
;(Seconds and AM/PM marker are optional)
time := "(?<hour>" RegEx_Range(0, 23) "):(?<min>" RegEx_Range(0, 59) ")"
. "(?::(?<sec>" RegEx_Range(0, 59) "))?(?:\s*+(?<marker>(?i)[AP]M))?"
FormatTime, currentTime,, h:mm:ss tt
msg := "The current time is: " currentTime
MsgBox, % msg
if RegExMatch(msg, time, match)
{
MsgBox, % "Hour: " matchHour "`n"
. "Minute: " matchMin "`n"
. "Second: " matchSec "`n"
. "AM/PM: " matchMarker
}
else
MsgBox, % "No match." |
Example 3 (General test for accuracy)
| Code: | /*
A test script for the RegEx_Range function
*/
;the start and end values for the range
start := -5
end := 199
;if using RegEx_RangeNZ, digitCounts should be zero - since leading zeros won't match
;number of digits in the result (padded with leading zeros, if necessary)
;should be between strLen(start) and strLen(end)
digitCount := 3
;digit count for negative numbers
digitCountN := 1
;the Regular Expression that matches the range
if !RegEx_Range := RegEx_Range(start, end)
{
MsgBox, % "Either start or end was not a number. Exiting..."
ExitApp
}
; MsgBox, % RegEx_Range
;zeros (used for padding)
zeros := repeat("0", digitCount)
itWorks := true
;loops through each possible value
Loop, % for(i, start, end, i_step := 0)
{
if (i < 0 && strLen(abs(i)) < digitCountN)
{
;if number of digits in <i> is less than digitCount,
;pad with leading zeros
number := "-" subStr(zeros . abs(i), 1 - digitCountN)
}
else if (i > 0 && strLen(i) < digitCount)
{
;if number of digits in <i> is less than digitCount,
;pad with leading zeros
number := subStr(zeros . i, 1 - digitCount)
}
else
number := i
if RegExMatch(number, "(" RegEx_Range ")", match) != 1 || match1 != number
{
;if not a match (something went wrong)
itWorks := false
MsgBox, 4, % "Continue?", % "Match was " match
. ".`nShould have matched " number ".`nContinue?"
IfMsgBox, No
break
}
i += i_step
}
if (itWorks)
MsgBox, % "It works!"
else
MsgBox, % "It didn't work :("
max(value1, value2)
{
return (value1 >= value2 ? value1 : value2)
} |
How to use
Extract the zip's contents to a library folder for automatic inclusion - StdLib compliant.
A copy of the above examples can be found in the "Func Examples" folder.
Download RegEx functions _________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts.
Last edited by animeaime on Sun Jun 21, 2009 5:37 am; edited 4 times in total |
|
| Back to top |
|
 |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sun Jun 21, 2009 1:19 am Post subject: |
|
|
I changed the function around a lot, and added support for some additional functionality - read the first post for details and usage. The function is now located in RegEx.ahk, and the function's name is RegEx_Range.
If desired, I can change the function name back to RegExRange and include the added RegExRangeZ in a separate AHK file - looking for feedback on this.
The biggest changes are that negative numbers are now supported, and the return pattern no longer uses non-capture groups - so it's usable in "all" RegEx engines (even the more basic ones found in text editors and the like). However, the returned RegEx is in a "raw" form - it may need to be placed in a capture group (named or unnamed) or a non-capture group for correct functionality when integrated into a larger pattern (see the first post for some examples).
Also, this version fixed some bugs in the last version.
Enjoy. _________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts.
Last edited by animeaime on Sun Jun 21, 2009 2:40 am; edited 2 times in total |
|
| Back to top |
|
 |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sun Jun 21, 2009 2:31 am Post subject: |
|
|
OK, sorry for the continuous spamming of posts, but I fixed another "bug". The previous version works, but the return was more complicated than necessary (due to a typo when converting the code from Java). This new version fixes the problem.
Download the latest version. _________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts. |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|