| View previous topic :: View next topic |
| Author |
Message |
heresy
Joined: 11 Mar 2008 Posts: 291
|
Posted: Sat Aug 30, 2008 11:16 am Post subject: Regular Expression Training Thread |
|
|
Objective : Challenge and improve your Regex skill.
shortest pattern will be present as reference in second post of this thread.
since it's mainly focused on training than challenging. courses will be proceed by step through from easier to harder
notice that only red colored part in following example will be counted.
| Code: | | RegExMatch(Haystack, "Pattern", Output) |
Regex Training One : Extract names and their date of birth who was born in between 1987 and 1992
Rule :
1) You must solve it by using Regex.
2) Loop/IF statement are allowed.
Original Source :
| Code: | Haystack=
(
Horace Martin 1980-04-20
Lyndsey Wilkerson 1986-02-12
Clarissa Kuster 1991-05-28
Tamika Minnie 1973-11-14
Shania Jerome 1979-08-30
Rylee Millhouse 1984-04-21
Orrell Zundel 1988-01-24
Daniel Kim 1973-10-10
Hatty Franks 1987-07-24
Pene Woodworth 1990-02-12
) |
Expected Output :
| Code: | Clarissa Kuster 1991-05-28
Orrell Zundel 1988-01-24
Hatty Franks 1987-07-24
Pene Woodworth 1990-02-12 |
Regex Training Two : Extract google's logo image url from html source
Rule :
1) You must solve it by using Regex.
2) Start from following code
3) RegExMatch(Haystack, "Literal Expected Output") is fail
Original Source :
| Code: | URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm |
Expected Output :
| Code: | | http://www.google.com/intl/en_ALL/images/logo.gif |
Let me know if you have any suggestion or ideas for upcoming challenges
Hope you guys have fun with it _________________ Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com
Last edited by heresy on Sun Aug 31, 2008 12:13 pm; edited 5 times in total |
|
| Back to top |
|
 |
heresy
Joined: 11 Mar 2008 Posts: 291
|
Posted: Sat Aug 30, 2008 11:17 am Post subject: |
|
|
reserved _________________ Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com |
|
| Back to top |
|
 |
Serenity
Joined: 07 Nov 2004 Posts: 1271
|
Posted: Sat Aug 30, 2008 11:39 am Post subject: |
|
|
| Code: | Haystack=
(
Horace Martin 1980-04-20
Lyndsey Wilkerson 1986-02-12
Clarissa Kuster 1991-05-28
Tamika Minnie 1973-11-14
Shania Jerome 1979-08-30
Rylee Millhouse 1984-04-21
Orrell Zundel 1988-01-24
Daniel Kim 1973-10-10
Hatty Franks 1987-07-24
Pene Woodworth 1990-02-12
)
loop, parse, haystack, `n
{
year := regexreplace( a_loopfield, "(^\w+ \w+|-\d+-\d+$)" )
if year between 1987 and 1992
list := a_loopfield . "`n" . list
}
msgbox % list |
_________________ "Anything worth doing is worth doing slowly." - Mae West
 |
|
| Back to top |
|
 |
heresy
Joined: 11 Mar 2008 Posts: 291
|
Posted: Sat Aug 30, 2008 11:49 am Post subject: |
|
|
hi serenity.
Rule 1) You must solve it by using Regex.
IF statement is allowed but you've solved it through IF Between rather than Regex.
year of birth Validation also need to be proceed through RegEx. _________________ Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com |
|
| Back to top |
|
 |
Serenity
Joined: 07 Nov 2004 Posts: 1271
|
Posted: Sat Aug 30, 2008 11:59 am Post subject: |
|
|
| Code: | loop, parse, haystack, `n
{
year := regexreplace( a_loopfield, "(^\w+ \w+|-\d+-\d+$)" )
if regexmatch( year, "(198[7-9]|199[0-2])" )
list := a_loopfield . "`n" . list
}
msgbox % list |
 _________________ "Anything worth doing is worth doing slowly." - Mae West
 |
|
| Back to top |
|
 |
heresy
Joined: 11 Mar 2008 Posts: 291
|
Posted: Sat Aug 30, 2008 12:02 pm Post subject: |
|
|
yeah that's a valid attempt though you've used regex twice. so your count will be 39. good luck
| Code: | (^\w+ \w+|-\d+-\d+$)
(198[7-9]|199[0-2]) |
_________________ Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com |
|
| Back to top |
|
 |
Serenity
Joined: 07 Nov 2004 Posts: 1271
|
Posted: Sat Aug 30, 2008 12:16 pm Post subject: |
|
|
| Code: | loop, parse, haystack, `n
{
if regexmatch( a_loopfield, "(^\w+ \w+ )(198[7-9]|199[0-2])(\-\d+-\d+$)" )
list := a_loopfield . "`n" . list
}
msgbox % list |
_________________ "Anything worth doing is worth doing slowly." - Mae West
 |
|
| Back to top |
|
 |
polyethene
Joined: 11 Aug 2004 Posts: 5248 Location: UK
|
Posted: Sat Aug 30, 2008 6:44 pm Post subject: |
|
|
| Code: | h =
(
Horace Martin 1980-04-20
Lyndsey Wilkerson 1986-02-12
Clarissa Kuster 1991-05-28
Tamika Minnie 1973-11-14
Shania Jerome 1979-08-30
Rylee Millhouse 1984-04-21
Orrell Zundel 1988-01-24
Daniel Kim 1973-10-10
Hatty Franks 1987-07-24
Pene Woodworth 1990-02-12
)
h := RegExReplace(h, "\D+\b19(?!8[7-9]|9[0-2])[\d-]+")
MsgBox, %h% |
| Code: | t = %A_Temp%\g
URLDownloadToFile, *0 http://www.google.co.uk/, %t%
FileRead, h, %t%
FileDelete, %t%
RegExMatch(h, "i)<img\b[^>]*\bsrc=(""|')([^\-1]+?)(?-2)", h)
MsgBox, %h2% |
Can I ask why you put unnecessary quote tags around your message like Skan often does? On my screen it makes text hard to read, so I only went by the code examples as a guideline. _________________ GitHub • Scripts • IronAHK • Contact by email not private message. |
|
| Back to top |
|
 |
heresy
Joined: 11 Mar 2008 Posts: 291
|
Posted: Sun Aug 31, 2008 6:16 am Post subject: |
|
|
| Titan wrote: | | Can I ask why you put unnecessary quote tags around your message like Skan often does? On my screen it makes text hard to read, so I only went by the code examples as a guideline. |
i was trying to have better readability by splitting questions into quote boxes. didn't realized that it could be looked like that. i'll reformat it. neways your 2nd regex doesn't match to expected output  _________________ Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com |
|
| Back to top |
|
 |
Krogdor
Joined: 18 Apr 2008 Posts: 1390 Location: The Interwebs
|
Posted: Sun Aug 31, 2008 7:30 am Post subject: |
|
|
#1:
| Code: | Haystack=
(
Horace Martin 1980-04-20
Lyndsey Wilkerson 1986-02-12
Clarissa Kuster 1991-05-28
Tamika Minnie 1973-11-14
Shania Jerome 1979-08-30
Rylee Millhouse 1984-04-21
Orrell Zundel 1988-01-24
Daniel Kim 1973-10-10
Hatty Franks 1987-07-24
Pene Woodworth 1990-02-12
)
Loop, Parse, Haystack, `n
If (RegExMatch(A_LoopField,"19(8[7-9]|9[0-2])"))
Output .= A_LoopField "`n"
MsgBox % Output |
Total of 17.
#2:
| Code: | URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm
RegExMatch(Haystack,"ue=(.+?)/"".+110 src=""(.+?)""",Output)
MsgBox % Output1 Output2 |
Total of 30. |
|
| Back to top |
|
 |
polyethene
Joined: 11 Aug 2004 Posts: 5248 Location: UK
|
Posted: Sun Aug 31, 2008 9:30 am Post subject: |
|
|
| heresy wrote: | | your 2nd regex doesn't match to expected output | I get the correct output. google.com/ncr redirects me to .co.uk so I probably don't have the same source as you. _________________ GitHub • Scripts • IronAHK • Contact by email not private message. |
|
| Back to top |
|
 |
heresy
Joined: 11 Mar 2008 Posts: 291
|
Posted: Sun Aug 31, 2008 12:09 pm Post subject: |
|
|
@ Krogdor
hey you've provoked the regex genius
@ Titan
i know that you're the person who can obtain all the hall of fame for regex stuff
but i was talking about the url header whether http://www.google.com or http://www.google.co.uk
| me myself wrote: | Expected Output :
| Code: | | http://www.google.com/intl/en_ALL/images/logo.gif |
|
_________________ Easy WinAPI - Dive into Windows API World
Benchmark your AutoHotkey skills at PlayAHK.com |
|
| Back to top |
|
 |
polyethene
Joined: 11 Aug 2004 Posts: 5248 Location: UK
|
Posted: Sun Aug 31, 2008 12:37 pm Post subject: |
|
|
The image URI is relative, so to put a new string before it has nothing to do with regex. You could easily pull "www.google.co.uk" from another part in the HTML source but there is no guarantee it is the same host of the resource in question i.e. my copy could be from a proxy or downloaded directly from IP. _________________ GitHub • Scripts • IronAHK • Contact by email not private message. |
|
| Back to top |
|
 |
Serenity
Joined: 07 Nov 2004 Posts: 1271
|
Posted: Sun Aug 31, 2008 1:27 pm Post subject: |
|
|
| Krogdor wrote: |
#2:
| Code: | URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm
RegExMatch(Haystack,"ue=(.+?)/"".+110 src=""(.+?)""",Output)
MsgBox % Output1 Output2 |
Total of 30. |
This returns blank for me. I wonder if it's a locale thing.
For some reason AHK won't let me use (110 src="(.+?)") or (110 src="([\w\D]+?)") in a script. I've run into this before when trying to match " character in regex.
| Code: | URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm
RegExMatch( Haystack, "(110 src="([\w\D]+?)")", m ) ; (110 src="(.+?)")
msgbox % "http://www.google.com" . m2 |
_________________ "Anything worth doing is worth doing slowly." - Mae West
 |
|
| Back to top |
|
 |
Krogdor
Joined: 18 Apr 2008 Posts: 1390 Location: The Interwebs
|
Posted: Sun Aug 31, 2008 7:56 pm Post subject: |
|
|
| Code: | URLDownloadToFile, http://www.google.com/ncr, %A_Temp%\g_index.htm
FileRead, Haystack, %A_Temp%\g_index.htm
RegExMatch( Haystack, "(110 src=""([\w\D]+?)"")", m ) ; (110 src="(.+?)")
msgbox % "http://www.google.com" . m2 |
You need to put two quotes in a row to escape them inside a quoted string. |
|
| Back to top |
|
 |
|