 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
JoeSchmoe as guest Guest
|
Posted: Wed Dec 31, 2008 11:39 pm Post subject: AAAAGGGGGHHHHH (a nasty PCRE from ISense) |
|
|
Hi, folks,
I've found a nasty regular expression and I was wondering if I could get some help decoding it. All of the ?Ps have got me freaking out. Here's the expression:
| Code: | LangRE := "J)("
. "(?P<Ignore>^(?P<Start>.*)\((?U).*(?-U)\)(?P<End>.*))" . "|"
. "(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))" . "|"
. "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
. "$)" |
It's from ISense and I've been working on it for about half an hour. It parses segments of code that start at the beginning of a line of code and end in the middle of it.
What I've already figured out:
* I know that <Ignore>, <found>, etc., match subexpressions which are accessed later through the output array of RegExMatch().
* I know that the RE is divided into three sections, separated by | and that it only has to match one section. I suspect that each section represents a syntax: perhaps one is for functions, one is for expressions, etc., but I don't know which is which.
What I don't know:
* what the ?s are for. There seem to be two types... those preceding a P and those not. I'm not sure what either do.
*How the ?&%$! thing works!
JoeSchmoe
The original code follows. The function receives a snippet of code as an argument. The snippet of code generally starts at the beginning of a line, but can end in the middle of it.
| Code: | Isense_HandleSelection( pSel ) {
global ISense_lastMatch, ISense_monitor, ISense_lastWord, ISense_selection
static init, Lang_Delim, LangRE
IfEqual, pSel,, return
If !init { ;--this can be pulled from ini...
Lang_Delim := ","
LangRE := "J)("
. "(?P<Ignore>^(?P<Start>.*)\((?U).*(?-U)\)(?P<End>.*))" . "|"
. "(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))" . "|"
. "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
. "$)"
init := true
}
;remove spaces and tabs from the start of the selection
ISense_TrimLeft( pSel )
;get the command and parameters
Loop, {
RegExMatch( pSel, LangRE, m )
mIgnore ? ( pSel := mStart . mEnd )
If mFound
break
Sleep, 1
}
If SubStr( mCmnd, 0, 1 ) = "("
mCmnd := SubStr( mCmnd, 1, StrLen( mCmnd )-1 ) , cmndType := "[]"
ISense_FindMatches( mCmnd . cmndType )
!ISense_lastMatch ? ( mParams := "" )
;Rise info or tooltip mode and set internal variables they use before that.
ISense_monitor := true
if mParams =
ISense_lastWord := mCmnd , ISense_lastMatch ? Info_Show( ISense_lastMatch )
else {
ISense_selection := mCmnd . cmndType
ISense_lastWord := mParams
ISense_ESetParamMode()
Tooltip_Show(-1, ISense_GetCurrentParam( mParams ))
}
} |
Bonus question: what does the " mIgnore ? ( pSel := mStart . mEnd )" line do? It doesn't look like valid syntax. |
|
| Back to top |
|
 |
freakkk
Joined: 29 Jul 2005 Posts: 179
|
Posted: Thu Jan 01, 2009 1:05 am Post subject: |
|
|
As a matter of fact, I remember doing that.. and I'll have u know I was freaking out! But- nasty!??
After tonight and the inevidable recovery tomorrow, I will post some detailed explantions for you, as well as respond to ur post on the other thread. _________________ .o0[ corey ]0o. |
|
| Back to top |
|
 |
freakkk
Joined: 29 Jul 2005 Posts: 179
|
Posted: Sat Jan 03, 2009 4:38 am Post subject: |
|
|
So here goes nothing....
| JoeSchmoe wrote: | | It parses segments of code that start at the beginning of a line of code and end in the middle of it. | You got it. This function is what re-evaluates the current line your cursor is on when you press your assigned hotkey, so it can display contextual parameter assistance. It captures from the beginning of line to your current cursor position, and passes as pSel var as a parameter to function.
| JoeSchmoe wrote: | | I know that the RE is divided into three sections, separated by | and that it only has to match one section. I suspect that each section represents a syntax: perhaps one is for functions, one is for expressions, etc., but I don't know which is which. |
| Quote: | What I don't know:
* what the ?s are for. There seem to be two types... those preceding a P and those not. I'm not sure what either do.
*How the ?&%$! thing works! |
Lets start with a simple example: A legacy AHK command (non-function)
| Code: | pSel = MsgBox, 48, title, text
MsgBox, % handleSelection( pSel )
handleSelection( pSel ) {
IfEqual, pSel,, return
Lang_Delim := ","
LangRE := "J)("
. "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
. "$)"
RegExMatch( pSel, LangRE, m )
return "mFound = " mFound "`nmCmnd = " mCmnd "`nmParams = " mParams
} |
Breaking down the regex,
| Quote: | | (([a-zA-Z0-9_\s]+),?(.*)) | There is a capturing subpattern for the entire match, & then two subpatterns within to capture the beginning command, and the parameters.
The ? is there means that the ',' is optional-- & isn't required to be considered a match.
| AHK Help wrote: | | A question mark matches zero or one of the preceding character, class, or subpattern. Think of this as "the preceding item is optional". For example, colou?r matches both color and colour because the "u" is optional. |
What makes the regex seem more complicated looking is that I am using named subpatterns. After you get used to using them, they actually make the pattern easier to translate (at least for me anyway.. )
| Quote: | | (?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*)) | So looking at the example above, you are feeding MsgBox, 48, title, text| through pSel param (the '|' represents your current cursor position). When performing the regex, your named subpatterns are captured and used to pull up msgbox's parameter help (3rd param is the active one..)
Now for the next example: Functions
| Code: | pSel = SubStr( string, pos, 15
MsgBox, % handleSelection( pSel )
handleSelection( pSel ) {
IfEqual, pSel,, return
Lang_Delim := ","
LangRE := "J)("
. "(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))" . "|"
. "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
. "$)"
RegExMatch( pSel, LangRE, m )
return "mFound = " mFound "`nmCmnd = " mCmnd "`nmParams = " mParams
} |
Breaking down the regex again,
| Quote: | | ((.* )?([^\(]*\()(.*)) |
| Quote: | | (?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*)) |
- Entire match
- Command
- Paramaters
In this case, the optional subpattern is captured-- basically just to discard. Try setting pSel to RegExMatch( SubStr( string| to see what I mean..
Since the function will break from the loop as soon as it finds a pattern, we have to put this pattern before the previous one in our regex since it has more specific criteria to be a match.
Now, some questions you probably have:
- What happened to the loop, & whats the point of it?
- Whats mIgnore ? ( pSel := mStart . mEnd )
Consider this example:
| Quote: | | RegExMatch( SubStr(string, 1, 15), re, var| |
When you re-evaluate line, you want to see the 3rd parameter for regexmatch. The substr call in there is irrelevant, & will do nothing but confuse your parameter count. For this reason, the loop will collapse these complete function calls-- discarding params that will throw everything off.
| Code: | pSel = RegExMatch( SubStr(string, 1, 15), re, var|
MsgBox, % handleSelection( pSel )
handleSelection( pSel ) {
IfEqual, pSel,, return
Lang_Delim := ","
LangRE := "J)("
. "(?P<Ignore>^(?P<Start>.*)\((?U).*(?-U)\)(?P<End>.*))" . "|"
. "(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))" . "|"
. "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
. "$)"
Loop, { ;get the command and parameters
RegExMatch( pSel, LangRE, m )
MsgBox, , Loop Count: %A_Index%,
( ltrim
pSel = %pSel%
------------
mIgnore = %mIgnore%
mStart = %mStart%
mEnd = %mEnd%
------------
mFound = %mFound%
mCmnd = %mCmnd%
mParams = %mParams%
)
mIgnore ? ( pSel := mStart . mEnd )
; If match found, or hit 50 expression per line limit.. break from loop
If mFound || ( A_Index > 50 )
break
}
return "mFound = " mFound "`nmCmnd = " mCmnd "`nmParams = " mParams
} |
Again- the regex..
| Quote: | | (^(.*)\((?U).*(?-U)\)(.*)) |
| Quote: | | (?P<Ignore>^(?P<Start>.*)\((?U).*(?-U)\)(?P<End>.*)) |
- Entire match
- Beginning of line: Everything up to the '('
- Ending of line: Everything after the ')'
The \((?U).*(?-U)\) can be interpreted as-- everything between the parenthesis. (?U) is just making this portion of the pattern ungreedy.
| RegExMatch options wrote: | | Ungreedy. Makes the quantifiers *+?{} consume only those characters absolutely necessary to form a match, leaving the remaining ones available for the next part of the pattern. When the "U" option is not in effect, an individual quantifier can be made non-greedy by following it with a question mark. Conversely, when "U" is in effect, the question mark makes an individual quantifier greedy. |
| JoeSchmoe wrote: | | what does the " mIgnore ? ( pSel := mStart . mEnd )" line do? It doesn't look like valid syntax. | This is a ternary expression. It is a shorthand way of writing:
| Code: | If mIgnore ;:= true
pSel := mStart . mEnd |
So within the loop, since the first time through it will find a 'mIgnore' match, it changes pSel var from RegExMatch( SubStr(string, 1, 15), re, var| to RegExMatch( SubStr, re, var|
I've tweaked the code some more in the repo:
| Code: | pSel = var1 -= 10, var := RegExReplace( SubStr( string, 1, StrLen( string )-15 ), re|
MsgBox, % handleSelection( pSel )
handleSelection( pSel ) {
IfEqual, pSel,, return
Lang_Delim := ","
LangRE := "J)("
. "(?P<Ignore>^(?P<Start>(\s|,)?.*)\((?U).*(?-U)\)(?P<End>.*))" . "|"
. "(?P<Ignore>^.*((\. |\+|-|\*|/|&|\^|\|<|>|!|\?|AND|OR|:)(=)?|(\s)?% )(\s)?(?P<Start>[^,]+$))" . "|"
. "(?P<Found>(.* )?(?P<Cmnd>[^\(]+\()(?P<Params>.*))" . "|"
. "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_#]+)(,|\s)?(?P<Params>.*))"
. "$)"
Loop, { ;get the command and parameters
RegExMatch( pSel, LangRE, m )
mIgnore ? ( pSel := mStart . mEnd )
If mFound || ( A_Index > 50 ) ; 50 expression per line limit..
break
}
return "mFound = " mFound "`nmCmnd = " mCmnd "`nmParams = " mParams
} | Let me know any specific AHK cases you see an issue with (I'm sure there are still some..) either on this thread, or via PM.
So maybe it is nasty (<--lol), but for me its a lot easier to read than a whole bunch of if / else / and / or / but / whatever's.
Using this technique will make this function fairly simple to adapt for use in any language, which is one of the overall goals for this project. It all comes down to loading a variable containing your language's RE.  _________________ .o0[ corey ]0o. |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|