AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Regular Expression Training Thread
Goto page 1, 2  Next
 
Reply to topic    AutoHotkey Community Forum Index -> General Chat
View previous topic :: View next topic  
Author Message
heresy



Joined: 11 Mar 2008
Posts: 291

PostPosted: Sat Aug 30, 2008 11:16 am    Post subject: Regular Expression Training Thread Reply with quote

Objective : Challenge and improve your Regex skill.

shortest pattern will be present as reference in second post of this thread.
since it's mainly focused on training than challenging. courses will be proceed by step through from easier to harder
notice that only red colored part in following example will be counted.

Code:
RegExMatch(Haystack, "Pattern", Output)



Regex Training One : Extract names and their date of birth who was born in between 1987 and 1992

    Rule :
    1) You must solve it by using Regex.
    2) Loop/IF statement are allowed.

    Original Source :
    Code:
    Haystack=
    (
    Horace Martin 1980-04-20
    Lyndsey Wilkerson 1986-02-12
    Clarissa Kuster 1991-05-28
    Tamika Minnie 1973-11-14
    Shania Jerome 1979-08-30
    Rylee Millhouse 1984-04-21
    Orrell Zundel 1988-01-24
    Daniel Kim 1973-10-10
    Hatty Franks 1987-07-24
    Pene Woodworth 1990-02-12
    )


    Expected Output :
    Code:
    Clarissa Kuster 1991-05-28
    Orrell Zundel 1988-01-24
    Hatty Franks 1987-07-24
    Pene Woodworth 1990-02-12


Regex Training Two : Extract google's logo image url from html source

    Rule :
    1) You must solve it by using Regex.
    2) Start from following code
    3) RegExMatch(Haystack, "Literal Expected Output") is fail

    Original Source :
    Code:
    URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
    FileRead, Haystack, %A_Temp%\g_index.htm


    Expected Output :
    Code:
    http://www.google.com/intl/en_ALL/images/logo.gif


Let me know if you have any suggestion or ideas for upcoming challenges
Hope you guys have fun with it
_________________
Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com


Last edited by heresy on Sun Aug 31, 2008 12:13 pm; edited 5 times in total
Back to top
View user's profile Send private message
heresy



Joined: 11 Mar 2008
Posts: 291

PostPosted: Sat Aug 30, 2008 11:17 am    Post subject: Reply with quote

reserved
_________________
Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com
Back to top
View user's profile Send private message
Serenity



Joined: 07 Nov 2004
Posts: 1271

PostPosted: Sat Aug 30, 2008 11:39 am    Post subject: Reply with quote

Code:
Haystack=
(
Horace Martin 1980-04-20
Lyndsey Wilkerson 1986-02-12
Clarissa Kuster 1991-05-28
Tamika Minnie 1973-11-14
Shania Jerome 1979-08-30
Rylee Millhouse 1984-04-21
Orrell Zundel 1988-01-24
Daniel Kim 1973-10-10
Hatty Franks 1987-07-24
Pene Woodworth 1990-02-12
)

loop, parse, haystack, `n
{
   year := regexreplace( a_loopfield, "(^\w+ \w+|-\d+-\d+$)" )
   if year between 1987 and 1992
      list := a_loopfield . "`n" . list
}
msgbox % list

_________________
"Anything worth doing is worth doing slowly." - Mae West
Back to top
View user's profile Send private message Visit poster's website
heresy



Joined: 11 Mar 2008
Posts: 291

PostPosted: Sat Aug 30, 2008 11:49 am    Post subject: Reply with quote

hi serenity.

Rule 1) You must solve it by using Regex.

IF statement is allowed but you've solved it through IF Between rather than Regex.
year of birth Validation also need to be proceed through RegEx.
_________________
Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com
Back to top
View user's profile Send private message
Serenity



Joined: 07 Nov 2004
Posts: 1271

PostPosted: Sat Aug 30, 2008 11:59 am    Post subject: Reply with quote

Code:
loop, parse, haystack, `n
{
   year := regexreplace( a_loopfield, "(^\w+ \w+|-\d+-\d+$)" )
   if regexmatch( year, "(198[7-9]|199[0-2])" )
      list := a_loopfield . "`n" . list
}
msgbox % list


Smile
_________________
"Anything worth doing is worth doing slowly." - Mae West
Back to top
View user's profile Send private message Visit poster's website
heresy



Joined: 11 Mar 2008
Posts: 291

PostPosted: Sat Aug 30, 2008 12:02 pm    Post subject: Reply with quote

yeah that's a valid attempt though you've used regex twice. so your count will be 39. good luck

Code:
(^\w+ \w+|-\d+-\d+$)
(198[7-9]|199[0-2])

_________________
Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com
Back to top
View user's profile Send private message
Serenity



Joined: 07 Nov 2004
Posts: 1271

PostPosted: Sat Aug 30, 2008 12:16 pm    Post subject: Reply with quote

Code:
loop, parse, haystack, `n
{
   if regexmatch( a_loopfield, "(^\w+ \w+ )(198[7-9]|199[0-2])(\-\d+-\d+$)" )
      list := a_loopfield . "`n" . list
}
msgbox % list

_________________
"Anything worth doing is worth doing slowly." - Mae West
Back to top
View user's profile Send private message Visit poster's website
polyethene



Joined: 11 Aug 2004
Posts: 5248
Location: UK

PostPosted: Sat Aug 30, 2008 6:44 pm    Post subject: Reply with quote

Code:
h =
(
Horace Martin 1980-04-20
Lyndsey Wilkerson 1986-02-12
Clarissa Kuster 1991-05-28
Tamika Minnie 1973-11-14
Shania Jerome 1979-08-30
Rylee Millhouse 1984-04-21
Orrell Zundel 1988-01-24
Daniel Kim 1973-10-10
Hatty Franks 1987-07-24
Pene Woodworth 1990-02-12
)

h := RegExReplace(h, "\D+\b19(?!8[7-9]|9[0-2])[\d-]+")
MsgBox, %h%


Code:
t = %A_Temp%\g
URLDownloadToFile, *0 http://www.google.co.uk/, %t%
FileRead, h, %t%
FileDelete, %t%

RegExMatch(h, "i)<img\b[^>]*\bsrc=(""|')([^\-1]+?)(?-2)", h)
MsgBox, %h2%


Can I ask why you put unnecessary quote tags around your message like Skan often does? On my screen it makes text hard to read, so I only went by the code examples as a guideline.
_________________
GitHubScriptsIronAHK Contact by email not private message.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
heresy



Joined: 11 Mar 2008
Posts: 291

PostPosted: Sun Aug 31, 2008 6:16 am    Post subject: Reply with quote

Titan wrote:
Can I ask why you put unnecessary quote tags around your message like Skan often does? On my screen it makes text hard to read, so I only went by the code examples as a guideline.


i was trying to have better readability by splitting questions into quote boxes. didn't realized that it could be looked like that. i'll reformat it. neways your 2nd regex doesn't match to expected output Razz
_________________
Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com
Back to top
View user's profile Send private message
Krogdor



Joined: 18 Apr 2008
Posts: 1390
Location: The Interwebs

PostPosted: Sun Aug 31, 2008 7:30 am    Post subject: Reply with quote

#1:
Code:
Haystack=
(
Horace Martin 1980-04-20
Lyndsey Wilkerson 1986-02-12
Clarissa Kuster 1991-05-28
Tamika Minnie 1973-11-14
Shania Jerome 1979-08-30
Rylee Millhouse 1984-04-21
Orrell Zundel 1988-01-24
Daniel Kim 1973-10-10
Hatty Franks 1987-07-24
Pene Woodworth 1990-02-12
)
Loop, Parse, Haystack, `n
  If (RegExMatch(A_LoopField,"19(8[7-9]|9[0-2])"))
    Output .= A_LoopField "`n"
MsgBox % Output

Total of 17.

#2:
Code:
URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm
RegExMatch(Haystack,"ue=(.+?)/"".+110 src=""(.+?)""",Output)
MsgBox % Output1 Output2

Total of 30.
Back to top
View user's profile Send private message AIM Address
polyethene



Joined: 11 Aug 2004
Posts: 5248
Location: UK

PostPosted: Sun Aug 31, 2008 9:30 am    Post subject: Reply with quote

heresy wrote:
your 2nd regex doesn't match to expected output
I get the correct output. google.com/ncr redirects me to .co.uk so I probably don't have the same source as you.
_________________
GitHubScriptsIronAHK Contact by email not private message.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
heresy



Joined: 11 Mar 2008
Posts: 291

PostPosted: Sun Aug 31, 2008 12:09 pm    Post subject: Reply with quote

@ Krogdor
hey you've provoked the regex genius Laughing

@ Titan
i know that you're the person who can obtain all the hall of fame for regex stuff
but i was talking about the url header whether http://www.google.com or http://www.google.co.uk Razz

me myself wrote:
Expected Output :
Code:
http://www.google.com/intl/en_ALL/images/logo.gif


_________________
Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com
Back to top
View user's profile Send private message
polyethene



Joined: 11 Aug 2004
Posts: 5248
Location: UK

PostPosted: Sun Aug 31, 2008 12:37 pm    Post subject: Reply with quote

heresy wrote:
i was talking about the url header whether http://www.google.com or http://www.google.co.uk
The image URI is relative, so to put a new string before it has nothing to do with regex. You could easily pull "www.google.co.uk" from another part in the HTML source but there is no guarantee it is the same host of the resource in question i.e. my copy could be from a proxy or downloaded directly from IP.
_________________
GitHubScriptsIronAHK Contact by email not private message.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Serenity



Joined: 07 Nov 2004
Posts: 1271

PostPosted: Sun Aug 31, 2008 1:27 pm    Post subject: Reply with quote

Krogdor wrote:

#2:
Code:
URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm
RegExMatch(Haystack,"ue=(.+?)/"".+110 src=""(.+?)""",Output)
MsgBox % Output1 Output2

Total of 30.


This returns blank for me. I wonder if it's a locale thing.

For some reason AHK won't let me use (110 src="(.+?)") or (110 src="([\w\D]+?)") in a script. I've run into this before when trying to match " character in regex.

Code:
URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm
RegExMatch( Haystack, "(110 src="([\w\D]+?)")", m ) ; (110 src="(.+?)")
msgbox % "http://www.google.com" . m2

_________________
"Anything worth doing is worth doing slowly." - Mae West
Back to top
View user's profile Send private message Visit poster's website
Krogdor



Joined: 18 Apr 2008
Posts: 1390
Location: The Interwebs

PostPosted: Sun Aug 31, 2008 7:56 pm    Post subject: Reply with quote

Code:
URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm
RegExMatch( Haystack, "(110 src=""([\w\D]+?)"")", m ) ; (110 src="(.+?)")
msgbox % "http://www.google.com" . m2


You need to put two quotes in a row to escape them inside a quoted string.
Back to top
View user's profile Send private message AIM Address
Display posts from previous:   
Reply to topic    AutoHotkey Community Forum Index -> General Chat All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group