AutoHotkey Community

It is currently May 26th, 2012, 12:59 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: January 1st, 2009, 12:39 am 
Hi, folks,

I've found a nasty regular expression and I was wondering if I could get some help decoding it. All of the ?Ps have got me freaking out. Here's the expression:
Code:
LangRE     := "J)("
                .   "(?P<Ignore>^(?P<Start>.*)\((?U).*(?-U)\)(?P<End>.*))" . "|"
                .   "(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))"    . "|"
                .   "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
                . "$)"

It's from ISense and I've been working on it for about half an hour. It parses segments of code that start at the beginning of a line of code and end in the middle of it.

What I've already figured out:
    * I know that <Ignore>, <found>, etc., match subexpressions which are accessed later through the output array of RegExMatch().
    * I know that the RE is divided into three sections, separated by | and that it only has to match one section. I suspect that each section represents a syntax: perhaps one is for functions, one is for expressions, etc., but I don't know which is which.

What I don't know:
    * what the ?s are for. There seem to be two types... those preceding a P and those not. I'm not sure what either do.
    *How the ?&%$! thing works!
JoeSchmoe

The original code follows. The function receives a snippet of code as an argument. The snippet of code generally starts at the beginning of a line, but can end in the middle of it.

Code:
Isense_HandleSelection( pSel )  {
  global ISense_lastMatch, ISense_monitor, ISense_lastWord, ISense_selection
  static init, Lang_Delim, LangRE

  IfEqual, pSel,, return
  If !init  {     ;--this can be pulled from ini...
    Lang_Delim := ","
    LangRE     := "J)("
                .   "(?P<Ignore>^(?P<Start>.*)\((?U).*(?-U)\)(?P<End>.*))" . "|"
                .   "(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))"    . "|"
                .   "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
                . "$)"
    init := true
  }

   ;remove spaces and tabs from the start of the selection
   ISense_TrimLeft( pSel )

  ;get the command and parameters
  Loop,  {
    RegExMatch( pSel, LangRE, m )
    mIgnore ? ( pSel := mStart . mEnd )
    If mFound
      break
    Sleep, 1
  }

  If SubStr( mCmnd, 0, 1 ) = "("
    mCmnd := SubStr( mCmnd, 1, StrLen( mCmnd )-1 ) , cmndType := "[]"

   ISense_FindMatches( mCmnd . cmndType )
   !ISense_lastMatch ? ( mParams := "" )

   ;Rise info or tooltip mode and set internal variables they use before that.
   ISense_monitor := true
   if mParams =
      ISense_lastWord := mCmnd     ,   ISense_lastMatch ? Info_Show( ISense_lastMatch )
   else {
      ISense_selection := mCmnd . cmndType
      ISense_lastWord := mParams
      ISense_ESetParamMode()
      Tooltip_Show(-1, ISense_GetCurrentParam( mParams ))
   }
}


Bonus question: :wink: what does the " mIgnore ? ( pSel := mStart . mEnd )" line do? It doesn't look like valid syntax.


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: January 1st, 2009, 2:05 am 
Offline

Joined: July 29th, 2005, 5:32 pm
Posts: 179
As a matter of fact, I remember doing that.. and I'll have u know I was freaking out! But- nasty!?? :P

After tonight and the inevidable recovery tomorrow, I will post some detailed explantions for you, as well as respond to ur post on the other thread.

_________________
.o0[ corey ]0o.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 3rd, 2009, 5:38 am 
Offline

Joined: July 29th, 2005, 5:32 pm
Posts: 179
So here goes nothing....

JoeSchmoe wrote:
It parses segments of code that start at the beginning of a line of code and end in the middle of it.
You got it. This function is what re-evaluates the current line your cursor is on when you press your assigned hotkey, so it can display contextual parameter assistance. It captures from the beginning of line to your current cursor position, and passes as pSel var as a parameter to function.

JoeSchmoe wrote:
I know that the RE is divided into three sections, separated by | and that it only has to match one section. I suspect that each section represents a syntax: perhaps one is for functions, one is for expressions, etc., but I don't know which is which.
Quote:
What I don't know:
* what the ?s are for. There seem to be two types... those preceding a P and those not. I'm not sure what either do.
*How the ?&%$! thing works!

Lets start with a simple example: A legacy AHK command (non-function)
Code:
pSel = MsgBox, 48, title, text

MsgBox, % handleSelection( pSel )

handleSelection( pSel )  {
  IfEqual, pSel,, return
  Lang_Delim := ","
  LangRE     := "J)("
              .   "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
              . "$)"

  RegExMatch( pSel, LangRE, m )

  return "mFound = " mFound "`nmCmnd = " mCmnd "`nmParams = " mParams
}


Breaking down the regex,
Quote:
(([a-zA-Z0-9_\s]+),?(.*))
There is a capturing subpattern for the entire match, & then two subpatterns within to capture the beginning command, and the parameters.
The ? is there means that the ',' is optional-- & isn't required to be considered a match.

AHK Help wrote:
A question mark matches zero or one of the preceding character, class, or subpattern. Think of this as "the preceding item is optional". For example, colou?r matches both color and colour because the "u" is optional.

What makes the regex seem more complicated looking is that I am using named subpatterns. After you get used to using them, they actually make the pattern easier to translate (at least for me anyway.. :D )
Quote:
(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))
So looking at the example above, you are feeding MsgBox, 48, title, text| through pSel param (the '|' represents your current cursor position). When performing the regex, your named subpatterns are captured and used to pull up msgbox's parameter help (3rd param is the active one..)

Now for the next example: Functions
Code:
pSel = SubStr( string, pos, 15

MsgBox, % handleSelection( pSel )

handleSelection( pSel )  {
  IfEqual, pSel,, return
  Lang_Delim := ","
  LangRE     := "J)("
              .   "(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))"    . "|"
              .   "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
              . "$)"

  RegExMatch( pSel, LangRE, m )
 
  return "mFound = " mFound "`nmCmnd = " mCmnd "`nmParams = " mParams
}

Breaking down the regex again,
Quote:
((.* )?([^\(]*\()(.*))
Quote:
(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))
  • Entire match
  • Command
  • Paramaters
In this case, the optional subpattern is captured-- basically just to discard. Try setting pSel to RegExMatch( SubStr( string| to see what I mean..

Since the function will break from the loop as soon as it finds a pattern, we have to put this pattern before the previous one in our regex since it has more specific criteria to be a match.

Now, some questions you probably have:
  • What happened to the loop, & whats the point of it?
  • Whats mIgnore ? ( pSel := mStart . mEnd )
Consider this example:
Quote:
RegExMatch( SubStr(string, 1, 15), re, var|

When you re-evaluate line, you want to see the 3rd parameter for regexmatch. The substr call in there is irrelevant, & will do nothing but confuse your parameter count. For this reason, the loop will collapse these complete function calls-- discarding params that will throw everything off.
Code:
pSel = RegExMatch( SubStr(string, 1, 15), re, var|

MsgBox, % handleSelection( pSel )

handleSelection( pSel )  {
  IfEqual, pSel,, return
  Lang_Delim := ","
  LangRE     := "J)("
              .   "(?P<Ignore>^(?P<Start>.*)\((?U).*(?-U)\)(?P<End>.*))" . "|"
              .   "(?P<Found>(.* )?(?P<Cmnd>[^\(]*\()(?P<Params>.*))"    . "|"
              .   "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_\s]+),?(?P<Params>.*))"
              . "$)"

  Loop,  {   ;get the command and parameters
    RegExMatch( pSel, LangRE, m )
   
    MsgBox, , Loop Count: %A_Index%,
                    ( ltrim
                      pSel = %pSel%
                      ------------
                      mIgnore = %mIgnore%
                      mStart = %mStart%
                      mEnd = %mEnd%
                      ------------
                      mFound = %mFound%
                      mCmnd = %mCmnd%
                      mParams = %mParams%
                    )
   
    mIgnore ? ( pSel := mStart . mEnd )
   
    ; If match found, or hit 50 expression per line limit.. break from loop
    If mFound || ( A_Index > 50 )
      break
  }
 
  return "mFound = " mFound "`nmCmnd = " mCmnd "`nmParams = " mParams
}

Again- the regex..
Quote:
(^(.*)\((?U).*(?-U)\)(.*))
Quote:
(?P<Ignore>^(?P<Start>.*)\((?U).*(?-U)\)(?P<End>.*))
  • Entire match
  • Beginning of line: Everything up to the '('
  • Ending of line: Everything after the ')'
The \((?U).*(?-U)\) can be interpreted as-- everything between the parenthesis. (?U) is just making this portion of the pattern ungreedy.

Ungreedy. Makes the quantifiers *+?{} consume only those characters absolutely necessary to form a match, leaving the remaining ones available for the next part of the pattern. When the "U" option is not in effect, an individual quantifier can be made non-greedy by following it with a question mark. Conversely, when "U" is in effect, the question mark makes an individual quantifier greedy.


JoeSchmoe wrote:
what does the " mIgnore ? ( pSel := mStart . mEnd )" line do? It doesn't look like valid syntax.
This is a ternary expression. It is a shorthand way of writing:
Code:
If mIgnore  ;:= true
  pSel := mStart . mEnd

So within the loop, since the first time through it will find a 'mIgnore' match, it changes pSel var from RegExMatch( SubStr(string, 1, 15), re, var| to RegExMatch( SubStr, re, var|

I've tweaked the code some more in the repo:
Code:
pSel = var1 -= 10, var := RegExReplace( SubStr( string, 1, StrLen( string )-15 ), re|

MsgBox, % handleSelection( pSel )

handleSelection( pSel )  {
  IfEqual, pSel,, return
  Lang_Delim := ","
  LangRE     := "J)("
              .   "(?P<Ignore>^(?P<Start>(\s|,)?.*)\((?U).*(?-U)\)(?P<End>.*))" . "|"
              .   "(?P<Ignore>^.*((\. |\+|-|\*|/|&|\^|\|<|>|!|\?|AND|OR|:)(=)?|(\s)?% )(\s)?(?P<Start>[^,]+$))" . "|"
              .   "(?P<Found>(.* )?(?P<Cmnd>[^\(]+\()(?P<Params>.*))"            . "|"
              .   "(?P<Found>(?P<Cmnd>[a-zA-Z0-9_#]+)(,|\s)?(?P<Params>.*))"
              . "$)"

  Loop,  {  ;get the command and parameters
    RegExMatch( pSel, LangRE, m )
    mIgnore ? ( pSel := mStart . mEnd )
    If mFound || ( A_Index > 50 )  ; 50 expression per line limit..
      break
  }
 
  return "mFound = " mFound "`nmCmnd = " mCmnd "`nmParams = " mParams
}
Let me know any specific AHK cases you see an issue with (I'm sure there are still some..) either on this thread, or via PM.

So maybe it is nasty (<--lol), but for me its a lot easier to read than a whole bunch of if / else / and / or / but / whatever's. :P
Using this technique will make this function fairly simple to adapt for use in any language, which is one of the overall goals for this project. It all comes down to loading a variable containing your language's RE. 8)

_________________
.o0[ corey ]0o.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Bing [Bot], BrandonHotkey, Exabot [Bot], Google [Bot], Maestr0, poserpro and 14 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group