AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

RegEx results contain unwanted characters

 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Ask for Help
View previous topic :: View next topic  
Author Message
quicktest



Joined: 30 Jul 2004
Posts: 42

PostPosted: Thu Feb 08, 2007 8:39 pm    Post subject: RegEx results contain unwanted characters Reply with quote

I have been trying to learn Regular Expression today, and thought I was beginning to understand it when I ran into this problem. Any help would be appreciated.

My task is to grab the values for each ID from right hand side:
Quote:
ID 1 : ABC
...
ID 21 = DEF
...
ID 307 - GHI
....

The results I want (in separate iterations) are

Quote:
ABC
DEF
GHI

What I am doing right now is reading each ID line in a loop, then run the following code on each line:

Code:
  RegExp = (?::|-|=)\s(.*)\R
  RegExMatch(Line, RegExp, Result)

But even though I specified ?: for the first set of (), I am still getting

Quote:
: ABC
= DEF
- GHI

Not only are the : = - included, a space is included as well, and I believe the new line characters too. I guess I am not too clear on what qualifies as a subpattern to be returned, and what are not?

2nd question is if there is a way for RegExMatch to return the RIGHT-most position of the match instead of the left-most? At least that way I can obtain my values via crude calculations and string functions.
Back to top
View user's profile Send private message
Titan



Joined: 11 Aug 2004
Posts: 5068
Location: imaginationland

PostPosted: Thu Feb 08, 2007 9:13 pm    Post subject: Reply with quote

Try %Result1%. You might not even need a loop if ids := RegExReplace(text, "ID.*?[:=\-]\s*|`n(?!ID).*?(?=$|`n)") works.
_________________

RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
quicktest



Joined: 30 Jul 2004
Posts: 42

PostPosted: Thu Feb 08, 2007 10:47 pm    Post subject: Reply with quote

Thanks much Titan; it seems the result I wanted was stored in Result1. Could you explain a bit on why this is? I understand RegExMatch can store multiple instances of values in an expandable array, although I only had 1 instance of value in the line, and do not understand why my RegExp would require more than 1 results to be created.

The ids expression you proposed appears to store the entire ID list in ids, although with all : and = and - removed. I will have to study it a little to understand why this is.

Thanks again for your help.
Back to top
View user's profile Send private message
Titan



Joined: 11 Aug 2004
Posts: 5068
Location: imaginationland

PostPosted: Thu Feb 08, 2007 11:56 pm    Post subject: Reply with quote

quicktest wrote:
2nd question is if there is a way for RegExMatch to return the RIGHT-most position of the match instead of the left-most? At least that way I can obtain my values via crude calculations and string functions.
You need a lazy match, e.g. .*?

The result is stored in Result1 because:
RegExMatch() - UnquotedOutputVar wrote:
If any capturing subpatterns are present inside NeedleRegEx, their matches are stored in an array whose base name is OutputVar. For example, if the variable's name is Match, the substring that matches the first subpattern would be stored in Match1, the second would be stored in Match2, and so on.


quicktest wrote:
The ids expression you proposed appears to store the entire ID list in ids, although with all : and = and - removed. I will have to study it a little to understand why this is.
Didn't you want that? If you remove the [:=\-]\s* part they should be left in.
_________________

RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
PhiLho



Joined: 27 Dec 2005
Posts: 6721
Location: France (near Paris)

PostPosted: Fri Feb 09, 2007 10:57 am    Post subject: Reply with quote

I don't understand your \R
If that's carriage return char, you must write it \r
A simple way to get what you want, if your IDs has no spaces inside, is just:
Code:
RegExp = \s(\S+)$
RegExMatch(Line, RegExp, Result)
Badly written hasty test:
Code:
Line1 = ID 1 : ABC
Line2 = ID 21 = DEF
Line3 = ID 307 - GHI
RegExp = \s(\S+)$
RegExMatch(Line1, RegExp, Result1)
RegExMatch(Line2, RegExp, Result2)
RegExMatch(Line3, RegExp, Result3)
MsgBox %Result11% %Result21% %Result31%

_________________
vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")
Back to top
View user's profile Send private message Visit poster's website
YMP



Joined: 23 Dec 2006
Posts: 265
Location: Russia

PostPosted: Fri Feb 09, 2007 12:12 pm    Post subject: Reply with quote

Regular Expressions (RegEx) - Quick Reference:
Quote:

In v1.0.46.06+, \R means "any single newline of any type" (namely `r, `n, or `r`n).
Back to top
View user's profile Send private message
PhiLho



Joined: 27 Dec 2005
Posts: 6721
Location: France (near Paris)

PostPosted: Fri Feb 09, 2007 2:48 pm    Post subject: Reply with quote

OK I forgot it, and I just quickly skimmed the left side of the reference... And didn't found it in http://mushclient.com/pcre/pcrepattern.html which is probably obsolete now... The info is indeed in http://www.pcre.org/pcre.txt
Thanks for the reminder.
_________________
vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")
Back to top
View user's profile Send private message Visit poster's website
quicktest



Joined: 30 Jul 2004
Posts: 42

PostPosted: Fri Feb 09, 2007 3:21 pm    Post subject: Reply with quote

Titan: Sorry, I must be going blind. Could you point me to where the lazy match .*? documentation is? Or was it .* that you meant? I don't seem to see how to make it return the right-most position though...

I took the subpattern explanation as meaning if I were processing ID1 and ID2 at the same time, then I would get the value for ID1 in Result1 and ID2 in Result2. Does the explanation instead mean that each () pair will cause a subpattern to be generated? How would I know if my results are not in Result2 or Result3?

Sorry, I made myself unclear; I wanted to obtain values like

Quote:
ABC


Although your regexp seems to give me the value

Quote:
ID 1 ABC


by removing the : sign. I didn't check the rest of the array though; I will give that anoter try later today.



PhiLho: Your (much) simpler solution is working well, thank you. It would seem I have a long way to go before fully understanding Regular Expressions. I am wondering what if there are spaces or sometimes blank in the values? I am trying to make this regexp general purpose, so it can be used to obtain values other than IDs. The criteria I am thinking of is to grab any values after the : - or = signs, all the way till the end of the line. This is why I came up with (?::|-|=)\s(.*)\R. Would there be a better way to achieve this?



Thanks much to all who helped this newbie.
Back to top
View user's profile Send private message
PhiLho



Joined: 27 Dec 2005
Posts: 6721
Location: France (near Paris)

PostPosted: Fri Feb 09, 2007 4:12 pm    Post subject: Reply with quote

If you replace the \R with $, which is more traditional and portable, your expression is OK (as long as there is only one : - = in the line).
Another way:
Code:
RegExp = ^ID \d+ . (.*)$

_________________
vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")
Back to top
View user's profile Send private message Visit poster's website
JSLover



Joined: 20 Dec 2004
Posts: 542
Location: LooseChange911.com... the WTC attacks were done by the US Gov't... the official story is a lie...

PostPosted: Sun Feb 11, 2007 2:19 pm    Post subject: Reply with quote

quicktest wrote:
Could you explain a bit on why this is?

...1st I want to mention that using result1/match1 is the correct way to solve this, but the following code is only to demonstrate "why ?: (question-colon)/a non-capturing subpattern is capturing"...
  1. The "overall match"..."match" contains any parts of the string that matched at all...including "non-capturing subpatterns" because the ?: only means it doesn't get its own "capture number" or "capture slot", but it's still included in the "overall match"
  2. Zero-length assertions, aren't in the "overall match"
Code:
data=
(LTrim
   ID 1 : ABC
   ID 21 = DEF
   ID 307 - GHI
)

;//normal regex, result in match1, not overall match
;//regex=(?::|-|=)\s(.*)

;//Zero-length look behind assertion, result in match1 *AND* overall match
regex=(?<=(?::|-|=)\s)(.*)

Loop, Parse, data, `n
{
   line:=A_LoopField
   RegExMatch(line, regex, match)
   msgbox,
   (LTrim
      line(%line%)

      match(%match%)
      match1(%match1%)
      match2(%match2%)
      match3(%match3%)
   )
}

_________________

Home • Click image! • Blog
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Ask for Help All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group