 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
peteryy
Joined: 12 Mar 2009 Posts: 16
|
Posted: Sat Jul 04, 2009 4:24 am Post subject: simple problem with RegExMatch() or InStr() |
|
|
im new to autohotkey, i need to copy every pic id and its price to save in other place and it was around 500KB file, so it will take me about half day time to search, copy and paste if i do it in manualy. i was trying to get the id number or by its price from a file, but i fail to creating the pattern to search. any one pls help??
this is the content of file that i want to search
| Quote: | <meta http-equiv=refresh content=4>
<PRE id="TRANS_DATA">
<TR><TD colspan=7 class='x2'>Race 1<TR><TD colspan=7 class='x3'>unconfirmed orders<TR><a href="javascript:mr(663362,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>12<TD>0<TD>4.65<TD>44/0<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664258,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>10<TD>10<TD>4.75<TD>51/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664413,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>22<TD>22<TD>4.80<TD>42/12<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664616,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>17<TD>17<TD>4.90<TD>51/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663995,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>0<TD>15<TD>4.90<TD>0/10<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663136,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>30<TD>0<TD>4.50<TD>25/0<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663779,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>0<TD>13<TD>4.75<TD CLASS=RD>0/6<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664412,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>22<TD>22<TD>4.80<TD>44/12<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(665290,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>12<TD>12<TD>4.80<TD CLASS=RD>8/7<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(665051,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>17<TD>17<TD>4.85<TD CLASS=RD>11/6<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664615,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>22<TD>22<TD>4.90<TD>58/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663980,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>0<TD>15<TD>4.90<TD>0/10<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664841,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>26<TD>26<TD>5.00<TD>55/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663580,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>13<TD>13<TD>4.10<TD>56/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(662945,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>10<TD>10<TD>4.10<TD>54/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663123,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>30<TD>0<TD>4.50<TD>27/0<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663554,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>12<TD>0<TD>4.65<TD>44/0<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664171,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>4<TD>4<TD>4.75<TD>52/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663775,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>0<TD>13<TD>4.75<TD CLASS=RD>0/6<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664397,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>16<TD>16<TD>4.80<TD>40/12<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(665235,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>12<TD>12<TD>4.80<TD CLASS=RD>18/7<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(665074,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>14<TD>14<TD>4.85<TD CLASS=RD>8/6<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664620,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>14<TD>14<TD>4.90<TD>59/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663988,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>0<TD>15<TD>4.90<TD>0/10<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664835,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>34<TD>34<TD>5.00<TD>57/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663397,'PIC','AU','SG')"><TD>1<TD class='x5'>7<TD>8<TD>8<TD>4.10<TD>60/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(662942,'PIC','AU','SG')"><TD>1<TD class='x5'>7<TD>12<TD>12<TD>4.10<TD>60/15<TD><font class='del2'>Del</font></TD></a> |
this my testing code
| Code: | filedelete, test.txt
Loop,5
{
Loop, read, pc.txt, var
{
FileRead,var1,pc.txt
StringReplace, var1, var1, </TD>,`n , All
start1:=InStr(var1,"java",0,start1)+14
end1:=InStr(var1,",'EAT",0,start1)
StringMid,var1,var1,% start1,% end1-start1
FileAppend, %var1%`n, Test.txt
;count += 1
;msgbox <%start1%> <%end1%>
}
} |
and lastly this is the result that i manage to produce
| Quote: | 665043
equiv=refresh content=4>
<PRE id="TRANS_DATA">
<TR><TD colspan=7 class='x2'>Race 1<TR><TD colspan=7 class='x3'>unconfirmed orders<TR><a href="javascript:mr(663362
663362
664258
664413
664616
663995
663136
663779
664412
665290
665051
664615
663980
664841
663580
662945
663123
663554
664171
663775
664397
665235
665074
664620
663988
664835
663397
662942
663181
663457
664229
663771
664420
665141
|
now my problem is my coding will resulting endless loop and accidently resulting non id data for me...
any one pls help me to solve this??
pls pls...dun wana waste so many time for just search, copy and paste...
pls |
|
| Back to top |
|
 |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sat Jul 04, 2009 4:45 am Post subject: |
|
|
Dumb question, but can you give a "snip" (i.e. example) of a pic ID? By using this example, you can extract the pattern.
For example, I think this is a pic ID, but I want to be sure.
| Code: | | "javascript:mr(664258,'PIC','AU','SG')" |
In the above example, you want the 664258, right? _________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts. |
|
| Back to top |
|
 |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sat Jul 04, 2009 5:00 am Post subject: |
|
|
OK, below is my initial attempt, please give feedback on what it does right and wrong, so I can adjust the code. Of course, you can adjust the code as you see fit. As testing, I copied the contents of the "quote block", the massive listing of code, to "pc.txt". I then ran the below code and it created a list of 27 pic IDs in "test.txt".
| Code: | ReadFile := "pc.txt"
WriteFile := "test.txt"
;reads the text to match from <ReadFile>
FileRead, text, %ReadFile%
FileDelete, %WriteFile%
start := 1
;finds all occurrences of "javascript:mr(number,'PIC'"
; and appends the number, followed by a newline (\n) to the specified <WriteFile>
while pos := RegExMatch(text, "javascript:mr\((?<PicID>\d++),'PIC'", match, start)
{
FileAppend %MatchPicID%`n, %WriteFile%
start := pos + strLen(match)
}
MsgBox, Done |
_________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts. |
|
| Back to top |
|
 |
peteryy
Joined: 12 Mar 2009 Posts: 16
|
Posted: Sat Jul 04, 2009 5:40 am Post subject: |
|
|
thx to animeaime
haha...
its work!~!~and thx for fast respond...
so wonderful
for my stupid question again...
(text, "javascript:mr\((?<PicID>\d++),'PIC'", match, start)
can u explain for me??
and izit PicID = %MatchPicID%
becouse im quite blur with RegExMatch's perl expersion
thx again |
|
| Back to top |
|
 |
peteryy
Joined: 12 Mar 2009 Posts: 16
|
Posted: Sat Jul 04, 2009 5:45 am Post subject: |
|
|
| how about if i want to search all pic ID number which match its price 4.90 only?? |
|
| Back to top |
|
 |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sat Jul 04, 2009 6:13 am Post subject: |
|
|
So, it does the job? Good.
Ok. here's the breakdown.
When working with regular expressions, you are looking for a pattern. By using this pattern, you find what you seek. So, from your examples, it seems you were looking for "javascript:mr(" followed by a number, followed by ",'PIC'". For example, "javascript:mr(664258,'PIC'" would be one such match.
Once you figure out the pattern, you want to figure out what you want to capture, if anything. You wanted the pic ID, the number, so we want to capture it. Also, you need to figure out what remains the same, and what changes. In this case, only the number changes, the rest of it remains the same. So, in the resulting RegEx, everything but the number is literal text.
| Code: | ;the RegEx I used (in red, the literal text)
"javascript:mr\((?<PicID>\d++),'PIC'" |
When working with regular expressions, don't forget that some characters need to be escaped, because they have special meaning to the RegEx engine. Check out the RegEx - Quick Reference for a full list of these "metacharacters".
For example, an open parenthesis has special meaning, so to match a literal opening parenthesis, we need to precede it with a backslash ("\")).
| Code: | ;the RegEx I used
"javascript:mr\((?<PicID>\d++),'PIC'" |
OK, hopefully you are still following me. If not, feel free to ask questions on anything I mention.
Now, after handling the literal text (the part that doesn't change), we need to handle the part that does - in this case, the number. So, where the number was in the example, "javascript:mr(664258,'PIC'", we need to replace it with a RegEx that matches "a number". Since the number is a decimal number (base 10) and can be "any length" (for all I know), I decided on this RegEx, "(?<PicID>\d++)". This RegEx will match one ore more digits and store it in the named group, "PicID".
The "(?<PicID>" starts a named-capture group. This allows storing a value which can later be retrieved - see the use of "MatchPicID" in the code. "PicID" is the group, and "Match" is the "UnquotedOutputVar" I specified when calling RegExMatch.
The "\d++" matches one or more digits. "\d" means "a digit" (0-9). The "+" after means "one or more". The next "+" makes it possessive - this means that no backtracing will be done. Since the next character is a comma, there is no reason to backtrace. If you don't understand what I mean by backtracing, I refer you to an AWESOME RegEx tutorial which is what I studied when I first learned RegExes. The link for possessive is to the page in the tutorial for possessive quantifiers.
So, merging "what stays the same" with "what changes", I ended up with the RegEx I used. As stated before, if you have any questions, feel free to ask. I would also recommend checking out the tutorial I mentioned. I would say it's a must-have for anyone serious about learning regular expressions. _________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts. |
|
| Back to top |
|
 |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sat Jul 04, 2009 6:14 am Post subject: |
|
|
| peteryy wrote: | | how about if i want to search all pic ID number which match its price 4.90 only?? |
Where is the price found in the input? Can you give an example? In other words, how can you tell the price for a pic? _________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts. |
|
| Back to top |
|
 |
peteryy
Joined: 12 Mar 2009 Posts: 16
|
Posted: Sat Jul 04, 2009 6:32 am Post subject: |
|
|
haha...still in progress to digest...
cant imagine that respond so fast...
| Quote: | | <TR><TD colspan=7 class='x2'>Race 1<TR><TD colspan=7 class='x3'>unconfirmed orders<TR><a href="javascript:mr(663362,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>12<TD>0<TD>4.65<TD>44/0<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664258,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>10<TD>10<TD>4.75<TD>51/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664413,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>22<TD>22<TD>4.80<TD>42/12<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664616,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>17<TD>17<TD>4.90<TD>51/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663995,'PIC','AU','SG')"><TD>1<TD class='x5'>4<TD>0<TD>15<TD>4.90<TD>0/10<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663136,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>30<TD>0<TD>4.50<TD>25/0<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663779,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>0<TD>13<TD>4.75<TD CLASS=RD>0/6<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664412,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>22<TD>22<TD>4.80<TD>44/12<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(665290,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>12<TD>12<TD>4.80<TD CLASS=RD>8/7<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(665051,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>17<TD>17<TD>4.85<TD CLASS=RD>11/6<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664615,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>22<TD>22<TD>4.90<TD>58/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663980,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>0<TD>15<TD>4.90<TD>0/10<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664841,'PIC','AU','SG')"><TD>1<TD class='x5'>5<TD>26<TD>26<TD>5.00<TD>55/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663580,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>13<TD>13<TD>4.10<TD>56/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(662945,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>10<TD>10<TD>4.10<TD>54/15<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663123,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>30<TD>0<TD>4.50<TD>27/0<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(663554,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>12<TD>0<TD>4.65<TD>44/0<TD><font class='del2'>Del</font></TD></a><TR><a href="javascript:mr(664171,'PIC','AU','SG')"><TD>1<TD class='x5'>6<TD>4<TD>4<TD>4.75<TD>52/15<TD><font class='del2'>Del</font></TD></a> |
|
|
| Back to top |
|
 |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sat Jul 04, 2009 6:43 am Post subject: |
|
|
OK, I'm not sure if this simplification leads to false positives, but here's the new version. This version assumes that a price, "#.## "will follow every pic ID, and that the first number in the form "#.##" after the pic ID is its price. If these assumptions are not valid, I can modify it as such. Note the parts in red are the changes from the previous version.
| Code: | ReadFile := "pc.txt"
WriteFile := "test.txt"
;reads the text to match from <ReadFile>
FileRead, text, %ReadFile%
FileDelete, %WriteFile%
start := 1
/*
finds all occurrences of "javascript:mr(number,'PIC'"
followed by a price (#.##) zero or more characters away
For each pic ID / price, the following is appended
"PicID Price`n" - the PicID, a space, the price, and the newline character, "`n"
*/
while pos := RegExMatch(text, "javascript:mr\((?<PicID>\d++),'PIC'.*?(?<Price>\d\.\d{2})", match, start)
{
FileAppend %MatchPicID% %MatchPrice%`n, %WriteFile%
start := pos + strLen(match)
}
MsgBox, Done |
The output (text.txt):
| Code: | 663362 4.65
664258 4.75
664413 4.80
664616 4.90
663995 4.90
663136 4.50
663779 4.75
664412 4.80
665290 4.80
665051 4.85
664615 4.90
663980 4.90
664841 5.00
663580 4.10
662945 4.10
663123 4.50
663554 4.65
664171 4.75
663775 4.75
664397 4.80
665235 4.80
665074 4.85
664620 4.90
663988 4.90
664835 5.00
663397 4.10
662942 4.10 |
_________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts. |
|
| Back to top |
|
 |
peteryy
Joined: 12 Mar 2009 Posts: 16
|
Posted: Sat Jul 04, 2009 6:46 am Post subject: |
|
|
is working is working!~!~
thx for provide me tutorial to follow with..
now i get what u mean in ((?<PicID>\d++)...
haha...
thx you very much..
u save my life alot
i need to put more time to understand Regular-Expressions
i dun neo how to say...
thx |
|
| Back to top |
|
 |
animeaime
Joined: 04 Nov 2008 Posts: 1045
|
Posted: Sat Jul 04, 2009 6:52 am Post subject: |
|
|
You're welcome, glad to help. Hope you check out the tutorial - it's an even bigger life saver than me . _________________ As always, if you have any further questions, don't hesitate to ask.
Add OOP to your scripts via the Class Library. Check out my scripts. |
|
| Back to top |
|
 |
peteryy
Joined: 12 Mar 2009 Posts: 16
|
Posted: Sat Jul 04, 2009 6:55 am Post subject: |
|
|
okay i will stick to it..i promis thx again  |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|