AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

[Func] RegEx_Range-returns a RegEx to match a range of nums

 
Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions
View previous topic :: View next topic  
Author Message
animeaime



Joined: 04 Nov 2008
Posts: 1045

PostPosted: Sat Jun 20, 2009 7:26 am    Post subject: [Func] RegEx_Range-returns a RegEx to match a range of nums Reply with quote

I was working on a Java library which extends regular expressions and thought to write this function as a small challenge.

Functions: RegEx_Range and RegEx_RangeNZ

Description
  • Returns a Regular Expression that matches a number in the numeric range, start to end, passed as arguments.
The produced RegEx uses these RegEx syntaxes: character classes (e.g. [1-3]) and alternation (e.g. 1|2). Edit: it no longer uses non-capture groups. This allows the returned regular expression to be used in "All" RegEx engines (even basic ones found in text editors and the like).



Download
RegEx.zip

Requirements
None



Functions
RegEx_RangeNZ(start, end)
Convience function for calling RegEx_Range(start, end, false).
Mnemonic: RegEx_RangeNZ (No leading Zeros).


RegEx_Range(start, end, allowLeadingZeros = true)

Parameters
Start and end can be any integer (positive, negative, or zero (including "-0"), in decimal form). Otherwise, the empty string is returned.
Note: start need not be less than or equal to end. If start > end, start and end will be swapped when calling the internal function RegEx_Range_private.

If start or end is "-0", "-00", etc. then negative zero will match. Otherwise, negative zero won't match (even in the case that start is negative and end is positive).

ReturnValue
A Regular Expression that matches a number in the numeric range, start to end.

This function's return can be used as a quasi-boolean value. The statement if RegEx_Range(...) would be true if no error occurred, and false otherwise (i.e. either start or end is not a number).
Note: if start = end = 0, then "0" will be returned (a "false" value, even though there is no error).

Remarks
The returned pattern will match a number with digit count between strLen(start) and strLen(end).
Note: strLen(start) need not be less than or equal to strLen(end).

Code:
;e.g.
RegEx_Range(0000, 1) ;will match a number between 0 and 1 with 1 to 4 digits
RegEx_Range(0000, 01) ;will match a number between 0 and 1 with 2 to 4 digits

;Likewise,
RegEx_Range(0, 0001) ;will match a number between 0 and 1 with 1 to 4 digits
RegEx_Range(00, 0001) ;will match a number between 0 and 1 with 2 to 4 digits


Edit: in the case that start is negative and end is positive, the number of digits for a negative number will be from 1 to strLen(start), and the number of digits for a positive number will be from 1 to strLen(end). To alter this behavior, you can call the function twice (once for the negative numbers, and once for the positive), specifying leading zeros, as desired.

Code:
;negative numbers that have 0-padding must in quotes (or the leading zeros are lost)
;    - this is AHK, not the function
;positive numbers retain the 0-padding even if not in quotes

;matches -5 to 7, including, for example, "-5" and "7"
RegEx_Range := RegEx_Range("-05", "07")

number = -5

if RegExMatch(number, "^(" RegEx_Range ")", match)
    MsgBox, % "Match: " Match1
else
    MsgBox, % "No match"


Code:
;matches -5 to 7, including, for example, "7" (but not "-5")
RegEx_Range := RegEx_Range("-05", "-01") "|" RegEx_Range("0", "07")

number := "-5"

if RegExMatch(number, "^(" RegEx_Range ")", match)
    MsgBox, % "Match: " Match1
else
    MsgBox, % "No match"



Edit: The returned pattern is no longer inside non-capture group "(?:RegEx)". This is because some RegEx engines (such as ones found in text editors and the like), do not support non-capture groups. Instead, the pattern is the "raw" form (the RegEx inside the non-capture group in the original version). For example, RegEx_Range(1, 12) returns "1[0-2]|0?[1-9]".


The returned regular expression may not work as expected if inserted directly into a larger pattern.

For example, to verify that the number 1-12 is not part of a larger number, you might use this pattern.

Code:
;verifies a digit doesn't lie to the left or right of the match
RegEx := "(?<!\d)" RegEx_Range(1, 12) "(?!\d)"

if pos := RegExMatch("123 312 12", RegEx, match)
{
    ;pos = 1 - not what was expected
    ;the range should be placed in a capture or non-capture group
    ;to isolate it from the rest of the pattern
    MsgBox, % pos
}
else
    MsgBox, % "No match"


However, the above pattern doesn't work as expected because the alternation used in the returned range. However, by simply surrounding the range in a capture group (named or unnamed), or a non-capture group, then the result is as expected.

Code:
RegEx := "(?<!\d)(" RegEx_Range(1, 12) ")(?!\d)"

if pos := RegExMatch("123 312 12", RegEx, match)
{
    ;pos = 9 (the third "12")
    MsgBox, % pos
}
else
    MsgBox, % "No match"



The below pattern to check if a number is between 1 and 12 or 17 and 25 works fine (because it uses alternation, which won't cause conflicts).

Code:
;matches a number between 1 and 12 or 17 and 25
RegEx := RegEx_Range(17, 25) "|" RegEx_Range(1, 12)

number = 17

if RegExMatch(number, RegEx, match)
    MsgBox, % match
else
    MsgBox, % "No match"


Note: you must have the larger range first. If you don't, it could lead to side-effects.

Code:
;matches a number between 1 and 12 or 17 and 25
RegEx := RegEx_Range(1, 12) "|" RegEx_Range(17, 25)

number = 17

if RegExMatch(number, RegEx, match)
{
    ;matched the "1" in "17"
    ;(alternations stop on the first match)
    MsgBox, % match
}
else
    MsgBox, % "No match"


Code
Code:
/*
Miscellaneous functions - not directly part of library, but required

They can be moved to a separate file, or kept here.
I have them in my library, as I use them in different projects
*/

;repeats the specified string, <count> times
;if count <= 0, the empty string is returned
repeat(str, count)
{
    Loop, %count%
    {
        result .= str
    }

    return result
}

;loops from start to stop (inclusive) with the given step
;http://www.autohotkey.com/forum/viewtopic.php?t=42553
for(ByRef LoopVariable, start, stop, ByRef step)
{
    if (!step)
        step := (start <= stop ? 1 : -1)

    LoopVariable := start

    return floor((stop - start) / step) + 1
}

/*
A library of functions used when dealing with Regular Expressions
*/

;convience function for calling RegEx_Range(start, end, false)
;RegEx_RangeNZ (No leading Zeros)
RegEx_RangeNZ(start, end)
{
    return RegEx_Range(start, end, false)
}

RegEx_Range(start, end, allowLeadingZeros = true)
{
    if start is not integer
        return
    else if end is not integer
        return

    ;store the current format
    oldIntFormat := A_FormatInteger

    ;ensures that the numbers are decimal numbers
    SetFormat, integer, dec

    if (start > end)
    {
        tmp := start
        start := end
        end := tmp
    }

    negStart := subStr(start, 1, 1) == "-"
    negEnd := subStr(end, 1, 1) == "-"

    if (negStart) {
        if (negEnd) {
            ;both start and end are negative
            ;e.g. -170 to -16 -> 16 to 170 (with a leading negative sign)

            StringTrimLeft, negS, end, 1
            StringTrimLeft, negE, start, 1
        } else {
            ;start is negative and end is non-negative

            negS := start = 0 ? "0" : "1"
            StringTrimLeft, negE, start, 1

            posS := "0"
            posE := end
        }
    } else {
        ;both are non-negative

        if (negEnd) {
            ;special case
            ;start = 0 and end = -0
            negS := "0"
            negE := "0"

            posS := "0"
            posE := "0"
        } else {
            posS := start
            posE := end
        }
    }

    if strLen(negS) != 0 {
        ;hex numbers are not yet supported
        if negS is not digit
            return
        if negE is not digit
            return

        result .= RegEx_range_private("-", negS, negE, allowLeadingZeros)
    }

    if strLen(posS) != 0 {
        ;hex numbers are not yet supported
        if posS is not digit
            return
        if posE is not digit
            return
               
        if strLen(result) != 0
            result .= "|"

        result .= RegEx_range_private("", posS, posE, allowLeadingZeros)
    }

    ;restore the current format
    SetFormat, integer, %oldIntFormat%

    return result
}

RegEx_range_private(lead, start, end, allowLeadingZeros)
{
    if (allowLeadingZeros)
        return RegEx_rangeZ_private(lead, start, end)

    ;remove leading zeros
    start += 0
    end += 0

    if strLen(start) == strLen(end)
        return RegEx_rangeZ_private(lead, start, end)

    Loop, % for(i, strLen(start), strLen(end), i_step := 1)
    {
        if (i == strLen(start))
        {
            tmpStart := start
            tmpEnd := repeat("9", i)
        }
        else if (i == strLen(end))
        {
            tmpStart := "1" . repeat("0", i - 1)
            tmpEnd := end
        }
        else
        {
            tmpStart := "1" . repeat("0", i - 1)
            tmpEnd := repeat("9", i)
        }

        if strLen(result) != 0
            result := "|" result

        result := RegEx_rangeZ_private(lead, tmpStart, tmpEnd) . result

        i += i_step
    }

    return result
}

RegEx_rangeZ_private(lead, start, end)
{
    if strLen(start) == 1 && strLen(end) == 1 {
        digit1 := asc(subStr(start, 1, 1)) - asc("0")
        digit2 := asc(subStr(end, 1, 1)) - asc("0")

        if (digit1 == digit2)
            return lead . start
        else if (digit1 == digit2 - 1)
            return lead "[" start . end "]"
        else
            return lead "[" start "-" end "]"
    } else {
        ;optZero - whether a leading zero is optional
        ;digit1 - the first digit in <start>
        ;digit2 - the first digit in <end>

        if strLen(start) < strLen(end) {
            ;optional zero (e.g. 1-17)

            digit1 := 0
            digit2 := asc(subStr(end, 1, 1)) - asc("0")

            newStart := start
            StringTrimLeft, newEnd, end, 1

            optZero := true
        } else if strLen(end) < strLen(start) {
            ;optional zero (e.g. 01-7)

            digit1 := asc(subStr(start, 1, 1)) - asc("0")
            digit2 := 0

            StringTrimLeft, newStart, start, 1
            newEnd := end

            optZero := true
        } else {
            digit1 := asc(subStr(start, 1, 1)) - asc("0")
            digit2 := asc(subStr(end, 1, 1)) - asc("0")

            StringTrimLeft, newStart, start, 1
            StringTrimLeft, newEnd, end, 1

            optZero := false
        }

        if (digit1 == digit2) {
            newLead := lead . digit1 . (optZero ? "?" : "")

            return RegEx_rangeZ_private(newLead, newStart, newEnd)
        }

        if (newStart != repeat("0", strLen(newStart))) {
            newLead := lead . digit1

            if (optZero && digit1 == 0)
                newLead .= "?"

            result := RegEx_rangeZ_private(newLead, newStart
                , repeat("9", strLen(newEnd)))

            digit1++
        }

        if (newEnd != repeat("9", strLen(newEnd))) {
            needSecondGroup := true
            digit2--
        } else
            needSecondGroup := false

        if (digit1 <= digit2) {
            if strLen(result) != 0
                result := "|" result

            newLead := lead

            if (digit1 == digit2)
                newLead .= digit1
            else if (digit1 == digit2 - 1)
                newLead .= "[" digit1 . digit2 "]"
            else
                newLead .= "[" digit1 "-" digit2 "]"

            if (optZero && digit1 == 0) {
                newLead .= "?"
                useLength := strLen(newStart)
            } else
                useLength := strLen(newEnd)

            result := RegEx_rangeZ_private(newLead, repeat("0", useLength)
                , repeat("9", strLen(newEnd)))
                . result
        }

        if (needSecondGroup) {
            if strLen(result) != 0
                result := "|" result

            newLead := lead . (digit2 + 1)

            if (optZero && digit2 + 1 == 0)
                newLead .= "?"

            result := RegEx_rangeZ_private(newLead
                    , repeat("0", strLen(newEnd)), newEnd)
                    . result
        }

        return result
    }
}


Example 1 (matching an IPv4 address)
Code:
;stored for simplicity and readability
byte := RegEx_Range(0, 255)

;matches an IPv4 address
;You can capture the number by placing the returned pattern in a capture group
IPv4 := "^(?<byte1>" byte ")\.(?<byte2>" byte ")\.(?<byte3>" byte ")\.(?<byte4>" byte ")$"
IP := "192.0.01.25"

if RegExMatch(IP, IPv4, match)
    MsgBox, % matchByte1 "." matchByte2 "." matchByte3 "." matchByte4
else
    MsgBox, % IP " is not a valid IPv4 address."


Example 2 (matching a time stamp)
Code:
;matches a time with form:
;Hour:Minute:Second AM/PM
;(Seconds and AM/PM marker are optional)
time := "(?<hour>" RegEx_Range(0, 23) "):(?<min>" RegEx_Range(0, 59) ")"
    . "(?::(?<sec>" RegEx_Range(0, 59) "))?(?:\s*+(?<marker>(?i)[AP]M))?"

FormatTime, currentTime,, h:mm:ss tt
msg := "The current time is: " currentTime

MsgBox, % msg

if RegExMatch(msg, time, match)
{
    MsgBox, % "Hour: " matchHour "`n"
        . "Minute: " matchMin "`n"
        . "Second: " matchSec "`n"
        . "AM/PM: " matchMarker
}
else
    MsgBox, % "No match."


Example 3 (General test for accuracy)
Code:
/*
A test script for the RegEx_Range function
*/

;the start and end values for the range
start := -5
end := 199

;if using RegEx_RangeNZ, digitCounts should be zero - since leading zeros won't match

;number of digits in the result (padded with leading zeros, if necessary)
;should be between strLen(start) and strLen(end)
digitCount := 3

;digit count for negative numbers
digitCountN := 1

;the Regular Expression that matches the range
if !RegEx_Range := RegEx_Range(start, end)
{
    MsgBox, % "Either start or end was not a number. Exiting..."
    ExitApp
}

; MsgBox, % RegEx_Range

;zeros (used for padding)
zeros := repeat("0", digitCount)

itWorks := true
;loops through each possible value
Loop, % for(i, start, end, i_step := 0)
{
    if (i < 0 && strLen(abs(i)) < digitCountN)
    {
        ;if number of digits in <i> is less than digitCount,
        ;pad with leading zeros

        number := "-" subStr(zeros . abs(i), 1 - digitCountN)
    }
    else if (i > 0 && strLen(i) < digitCount)
    {
        ;if number of digits in <i> is less than digitCount,
        ;pad with leading zeros
       
        number := subStr(zeros . i, 1 - digitCount)
    }
    else
        number := i

    if RegExMatch(number, "(" RegEx_Range ")", match) != 1 || match1 != number
    {
        ;if not a match (something went wrong)
       
        itWorks := false
        MsgBox, 4, % "Continue?", % "Match was " match
            . ".`nShould have matched " number ".`nContinue?"
       
        IfMsgBox, No
            break
    }
   
    i += i_step
}

if (itWorks)
    MsgBox, % "It works!"
else
    MsgBox, % "It didn't work :("
   
max(value1, value2)
{
    return (value1 >= value2 ? value1 : value2)
}


How to use
Extract the zip's contents to a library folder for automatic inclusion - StdLib compliant.

A copy of the above examples can be found in the "Func Examples" folder.



Download RegEx functions
_________________
As always, if you have any further questions, don't hesitate to ask.

Add OOP to your scripts via the Class Library. Check out my scripts.


Last edited by animeaime on Sun Jun 21, 2009 5:37 am; edited 4 times in total
Back to top
View user's profile Send private message Send e-mail
animeaime



Joined: 04 Nov 2008
Posts: 1045

PostPosted: Sun Jun 21, 2009 1:19 am    Post subject: Reply with quote

I changed the function around a lot, and added support for some additional functionality - read the first post for details and usage. The function is now located in RegEx.ahk, and the function's name is RegEx_Range.

If desired, I can change the function name back to RegExRange and include the added RegExRangeZ in a separate AHK file - looking for feedback on this.


The biggest changes are that negative numbers are now supported, and the return pattern no longer uses non-capture groups - so it's usable in "all" RegEx engines (even the more basic ones found in text editors and the like). However, the returned RegEx is in a "raw" form - it may need to be placed in a capture group (named or unnamed) or a non-capture group for correct functionality when integrated into a larger pattern (see the first post for some examples).

Also, this version fixed some bugs in the last version.

Enjoy.
_________________
As always, if you have any further questions, don't hesitate to ask.

Add OOP to your scripts via the Class Library. Check out my scripts.


Last edited by animeaime on Sun Jun 21, 2009 2:40 am; edited 2 times in total
Back to top
View user's profile Send private message Send e-mail
animeaime



Joined: 04 Nov 2008
Posts: 1045

PostPosted: Sun Jun 21, 2009 2:31 am    Post subject: Reply with quote

OK, sorry for the continuous spamming of posts, but I fixed another "bug". The previous version works, but the return was more complicated than necessary (due to a typo when converting the code from Java). This new version fixes the problem.

Download the latest version.
_________________
As always, if you have any further questions, don't hesitate to ask.

Add OOP to your scripts via the Class Library. Check out my scripts.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Reply to topic    AutoHotkey Community Forum Index -> Scripts & Functions All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group