AutoHotkey Community

It is currently May 24th, 2012, 5:40 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: February 8th, 2007, 8:39 pm 
Offline

Joined: July 30th, 2004, 6:47 pm
Posts: 42
I have been trying to learn Regular Expression today, and thought I was beginning to understand it when I ran into this problem. Any help would be appreciated.

My task is to grab the values for each ID from right hand side:
Quote:
ID 1 : ABC
...
ID 21 = DEF
...
ID 307 - GHI
....

The results I want (in separate iterations) are

Quote:
ABC
DEF
GHI

What I am doing right now is reading each ID line in a loop, then run the following code on each line:

Code:
  RegExp = (?::|-|=)\s(.*)\R
  RegExMatch(Line, RegExp, Result)

But even though I specified ?: for the first set of (), I am still getting

Quote:
: ABC
= DEF
- GHI

Not only are the : = - included, a space is included as well, and I believe the new line characters too. I guess I am not too clear on what qualifies as a subpattern to be returned, and what are not?

2nd question is if there is a way for RegExMatch to return the RIGHT-most position of the match instead of the left-most? At least that way I can obtain my values via crude calculations and string functions.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 8th, 2007, 9:13 pm 
Offline
User avatar

Joined: August 11th, 2004, 1:47 am
Posts: 5346
Location: UK
Try %Result1%. You might not even need a loop if ids := RegExReplace(text, "ID.*?[:=\-]\s*|`n(?!ID).*?(?=$|`n)") works.

_________________
GitHubScriptsIronAHK Contact by email not private message.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 8th, 2007, 10:47 pm 
Offline

Joined: July 30th, 2004, 6:47 pm
Posts: 42
Thanks much Titan; it seems the result I wanted was stored in Result1. Could you explain a bit on why this is? I understand RegExMatch can store multiple instances of values in an expandable array, although I only had 1 instance of value in the line, and do not understand why my RegExp would require more than 1 results to be created.

The ids expression you proposed appears to store the entire ID list in ids, although with all : and = and - removed. I will have to study it a little to understand why this is.

Thanks again for your help.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 8th, 2007, 11:56 pm 
Offline
User avatar

Joined: August 11th, 2004, 1:47 am
Posts: 5346
Location: UK
quicktest wrote:
2nd question is if there is a way for RegExMatch to return the RIGHT-most position of the match instead of the left-most? At least that way I can obtain my values via crude calculations and string functions.
You need a lazy match, e.g. .*?

The result is stored in Result1 because:
RegExMatch() - UnquotedOutputVar wrote:
If any capturing subpatterns are present inside NeedleRegEx, their matches are stored in an array whose base name is OutputVar. For example, if the variable's name is Match, the substring that matches the first subpattern would be stored in Match1, the second would be stored in Match2, and so on.


quicktest wrote:
The ids expression you proposed appears to store the entire ID list in ids, although with all : and = and - removed. I will have to study it a little to understand why this is.
Didn't you want that? If you remove the [:=\-]\s* part they should be left in.

_________________
GitHubScriptsIronAHK Contact by email not private message.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 9th, 2007, 10:57 am 
Offline

Joined: December 27th, 2005, 1:46 pm
Posts: 6837
Location: France (near Paris)
I don't understand your \R
If that's carriage return char, you must write it \r
A simple way to get what you want, if your IDs has no spaces inside, is just:
Code:
RegExp = \s(\S+)$
RegExMatch(Line, RegExp, Result)
Badly written hasty test:
Code:
Line1 = ID 1 : ABC
Line2 = ID 21 = DEF
Line3 = ID 307 - GHI
RegExp = \s(\S+)$
RegExMatch(Line1, RegExp, Result1)
RegExMatch(Line2, RegExp, Result2)
RegExMatch(Line3, RegExp, Result3)
MsgBox %Result11% %Result21% %Result31%

_________________
Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 9th, 2007, 12:12 pm 
Offline

Joined: December 23rd, 2006, 6:02 pm
Posts: 424
Location: Russia
Regular Expressions (RegEx) - Quick Reference:
Quote:
In v1.0.46.06+, \R means "any single newline of any type" (namely `r, `n, or `r`n).


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 9th, 2007, 2:48 pm 
Offline

Joined: December 27th, 2005, 1:46 pm
Posts: 6837
Location: France (near Paris)
OK I forgot it, and I just quickly skimmed the left side of the reference... And didn't found it in http://mushclient.com/pcre/pcrepattern.html which is probably obsolete now... The info is indeed in http://www.pcre.org/pcre.txt
Thanks for the reminder.

_________________
Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 9th, 2007, 3:21 pm 
Offline

Joined: July 30th, 2004, 6:47 pm
Posts: 42
Titan: Sorry, I must be going blind. Could you point me to where the lazy match .*? documentation is? Or was it .* that you meant? I don't seem to see how to make it return the right-most position though...

I took the subpattern explanation as meaning if I were processing ID1 and ID2 at the same time, then I would get the value for ID1 in Result1 and ID2 in Result2. Does the explanation instead mean that each () pair will cause a subpattern to be generated? How would I know if my results are not in Result2 or Result3?

Sorry, I made myself unclear; I wanted to obtain values like

Quote:
ABC


Although your regexp seems to give me the value

Quote:
ID 1 ABC


by removing the : sign. I didn't check the rest of the array though; I will give that anoter try later today.



PhiLho: Your (much) simpler solution is working well, thank you. It would seem I have a long way to go before fully understanding Regular Expressions. I am wondering what if there are spaces or sometimes blank in the values? I am trying to make this regexp general purpose, so it can be used to obtain values other than IDs. The criteria I am thinking of is to grab any values after the : - or = signs, all the way till the end of the line. This is why I came up with (?::|-|=)\s(.*)\R. Would there be a better way to achieve this?



Thanks much to all who helped this newbie.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 9th, 2007, 4:12 pm 
Offline

Joined: December 27th, 2005, 1:46 pm
Posts: 6837
Location: France (near Paris)
If you replace the \R with $, which is more traditional and portable, your expression is OK (as long as there is only one : - = in the line).
Another way:
Code:
RegExp = ^ID \d+ . (.*)$

_________________
Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 11th, 2007, 2:19 pm 
Offline
User avatar

Joined: December 20th, 2004, 12:19 pm
Posts: 794
Location: LooseChange911.com Ask Questions, Demand Answers █ The WTC bldgs █ shouldn't have fallen █ that fast
quicktest wrote:
Could you explain a bit on why this is?

...1st I want to mention that using result1/match1 is the correct way to solve this, but the following code is only to demonstrate "why ?: (question-colon)/a non-capturing subpattern is capturing"...
  1. The "overall match"..."match" contains any parts of the string that matched at all...including "non-capturing subpatterns" because the ?: only means it doesn't get its own "capture number" or "capture slot", but it's still included in the "overall match"
  2. Zero-length assertions, aren't in the "overall match"
Code:
data=
(LTrim
   ID 1 : ABC
   ID 21 = DEF
   ID 307 - GHI
)

;//normal regex, result in match1, not overall match
;//regex=(?::|-|=)\s(.*)

;//Zero-length look behind assertion, result in match1 *AND* overall match
regex=(?<=(?::|-|=)\s)(.*)

Loop, Parse, data, `n
{
   line:=A_LoopField
   RegExMatch(line, regex, match)
   msgbox,
   (LTrim
      line(%line%)

      match(%match%)
      match1(%match1%)
      match2(%match2%)
      match3(%match3%)
   )
}

_________________
AutoHotkey-Hotstring.ahk - Helping the world spell "AutoHotkey" correctly! (btw, it's a lowercase k!)


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Bing [Bot], engunneer, jyloup, Kirtman, Klark92, kwfine, mc-lemons and 69 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group