| View previous topic :: View next topic |
| Author |
Message |
Guest
|
Posted: Mon Feb 11, 2008 3:31 am Post subject: |
|
|
| Anonymous wrote: | | When searching the forums for regexreplace I get every thread PhilHo has ever posted to, since that the word regexreplace is part of his sig. Is there some way of excluding personal sigs from forum search, so this doesn't happen? Phil is invaluable as an instructor here but it's screwing up my search. |
Actually I can use "-Philho" as part of my search string, but that's not exactly what I wanted.  |
|
| Back to top |
|
 |
Heretic Guest
|
Posted: Wed Mar 12, 2008 11:17 pm Post subject: URL Extraction problem |
|
|
i'm trying to extract red colored URLs only from the source below
| Code: | | target="_blank">http://pics.site.com/archive/random/fh283fh/fh283fhpl.jpg</a><br>Link URL: <a href="/j.php?id=839523&u=http%3A%2F%2Fwww.site.com%2Flink.php%3Fref%3fhwio3fih3" target="_blank">http://www.site.com/link.php?ref=fhwio3fih3</a> <br><br> |
and below is what i've tried.
| Code: | FileRead, stripurl, htmlsource.html
intFoundNum := 0
infFoundPos := 1
loop{
intFoundPos := RegExMatch(stripurl, "target=""_blank"">([^<]*)</a><br>Link[^<]*target=""_blank"">([^<]*)</a>", urlstripped[%A_index%] ,intFoundPos )
if(intFoundPos=0){
break
}else{
intFoundNum++
}
intFoundPos+=strLen( urlstripped[%A_index%] )
}
Loop %intFoundNum%{
FirstURL :=urlstripped[%A_Index%]1
SecondURL :=urlstripped[%A_Index%]2
}
msgbox %FirstURL%
msgbox %SecondURL% |
and returns nothing
where did i wrong?
i'm new to AutoHotKey and also don't know well about RegEx
all i want to do is, extract two urls in the source.
can anyone help me please? |
|
| Back to top |
|
 |
Razlin
Joined: 05 Nov 2007 Posts: 436 Location: canada
|
Posted: Mon Apr 07, 2008 3:46 pm Post subject: |
|
|
I'm trying to find anything that is NOT 02402
I have a document with some errors.
ie
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:08888: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
ERROR:02402: tadadada File ERROR:02402: asf asdf fsdafas
notice the 08888 for example. (could be any number)
how does one do a search to find anything EXCEPT a number
IE here are some attempts
- ERROR:{[^0]{[^2]}{[^4]}{[^0]{[^2]}
- ERROR:{^[0]{^[2]}{^[4]}{^[0]{^[2]}
and
- ERROR {{^02402]} << doesnt find error 00004
I'm trying to find anything that is NOT 02402
Is that possible? |
|
| Back to top |
|
 |
Oberon
Joined: 18 Feb 2008 Posts: 456
|
Posted: Mon Apr 07, 2008 5:52 pm Post subject: |
|
|
| Use a negative lookahead, i.e. ...(?!02402).+ |
|
| Back to top |
|
 |
Wouther
Joined: 01 May 2007 Posts: 79 Location: The Netherlands
|
Posted: Mon Apr 07, 2008 7:25 pm Post subject: |
|
|
Hey all,
I understand basic RegEx, but I really don't know why this does not work:
| Code: | code =
(
...Website here...
Current documents:
<ul>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
</ul>
...Website here...
)
Foo("Current documents:", "</ul>")
Foo(param1, param2)
{
global
RegExMatch(code, ".*\Q" . param1 . "\E(.*)\Q" . param2 . "\E.*", Sub)
MsgBox, % Sub1
} |
It's a piece from a website's source code. I think it has something to do with the following I found on this topic:
| PhiLho wrote: | I use the surrounding tags, because HTML has other non-significant numbers:
| Code: | page =
(
</tr>
<tr id="C1WebGrid1_R2" bgcolor="#F2F2F2">
<td><font color="Black">
<a id="C1WebGrid1_R2_Hyperlink1" NAME="Hyperlink1" tabIndex="103" href="trgovanje.aspx?hartijaID=1105" target="_parent">EDPL-R-A</a>
</font></td><td><font color="Black">ELEKTRODISTRIBUCIJA AD PALE</font></td><td><font color="Black">13.11.2006</font></td><td><font color="Black">3,5300</font></td><td><font color="Black">3,5200</font></td><td><font color="Black">0</font></td><td><font color="Black">3,5100</font></td><td><font color="Black">3,6000</font></td>
</tr>
)
newPos := 1
Loop
{
fp := RegExMatch(page, "<font color=""Black"">([\d.,]+)</font>", number, newPos)
If (fp = 0)
Break
newPos := fp + StrLen(number)
results := results . number1 . "`n"
}
MsgBox %results%
|
|
So, the main goal is to retrieve that piece (between "Current documents:" and "</ul>") so I can use it in the rest of the script. Any ideas?
P.S. I tried something with the "m" option but it didn't work either...  _________________ Printing css/html-formatted text |
|
| Back to top |
|
 |
Wouther
Joined: 01 May 2007 Posts: 79 Location: The Netherlands
|
Posted: Tue Apr 08, 2008 2:39 pm Post subject: |
|
|
OK, this is a bit confusing...
I think the problem has something to do with newlines, because this code won't work: (the file contains exactly the same as the variable) | Code: | code =
(
...Website here...
Current documents:
<ul>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
<li>[date/time] <a href="content/xxxxxx">title</a></li>
</ul>
...Website here...
)
FileRead, code, code.txt
MsgBox, % code
Foo("<ul>", "</ul>")
Foo(param1, param2)
{
global
RegExMatch(code, ".*\Q" . param1 . "\E(.*)\Q" . param2 . "\E.*", Sub)
MsgBox, % Sub1
} | Disabling the "FileRead"-line works (so using the variable with exactly the same content as the file works...). Why? _________________ Printing css/html-formatted text |
|
| Back to top |
|
 |
HugoV
Joined: 27 May 2007 Posts: 617
|
Posted: Tue Apr 08, 2008 3:12 pm Post subject: |
|
|
| I just tested it and it works in both cases for me. Check the original file, is it perhaps UTF-8? |
|
| Back to top |
|
 |
Wouther
Joined: 01 May 2007 Posts: 79 Location: The Netherlands
|
Posted: Tue Apr 08, 2008 6:26 pm Post subject: |
|
|
| HugoV wrote: | | I just tested it and it works in both cases for me. | Strange... it doesn't work here. I reinstalled the latest version of ahk so it's not that. | Quote: | | Check the original file, is it perhaps UTF-8? | I'm not sure... I downloaded a website with URLDownloadToFile and used FileRead, if that matters.
Edit: I forgot to mention: thanks for you help!  _________________ Printing css/html-formatted text |
|
| Back to top |
|
 |
HugoV
Joined: 27 May 2007 Posts: 617
|
Posted: Tue Apr 08, 2008 7:14 pm Post subject: |
|
|
try this, works for me:
| Code: | | RegExMatch(code, "s).*\Q" . param1 . "\E(.*)\Q" . param2 . "\E.*", Sub) |
|
|
| Back to top |
|
 |
Wouther
Joined: 01 May 2007 Posts: 79 Location: The Netherlands
|
|
| Back to top |
|
 |
Ozay
Joined: 23 Oct 2007 Posts: 15
|
Posted: Tue Apr 15, 2008 8:55 pm Post subject: |
|
|
I have some data which has been OCR'd into text files. which looks like this:
| Code: | 30 00.01.04 RECORD OF REVISIONS
40 00.01.02 LIST OF CHAPTERS (TABLE OF CON
TENTS)
50 00.02.00 01 GENERAL INFORMATION
60 99.02.00 INTRODUCTION
1.60 99.03.01 KA 138 NAV TRANSFER RELAY
INSTALLATION DNG (REV 0)
9.50 99.08.02 429 SIGNAL LEVELS (TABLE
5-2)
9.60 99.08.02 RS232 SIGNAL LEVELS (TABLE
53) .
9.70 99.08.01 OVERHAUL |
Each line should begin with:
## ##.##.## (continues with text)
OR
#.## ##.##.## (more text)
OR
##.## ##.##.## (another section of text)
Sometimes the text will terminate and continue on the next line, and is either text or a number.
Either way, I need a way to automatically go through all the files and correct this. I have been able to get the extra lines which start with text to be fixed, but those with numbers are giving me problems:
| Code: | dataA := RegExReplace(data,"m)\R([a-zA-Z_()*`,])"," $1")
and
dataA := RegExReplace(data,"m)\R([^\d{2}|\d.\d{2}|\d{2}.\d{2}])"," $1") |
Both seem to work on the test data I am working with except on lines with numbers.
For example the lines:
40 00.01.02 LIST OF CHAPTERS (TABLE OF CON
TENTS)
9.50 99.08.02 429 SIGNAL LEVELS (TABLE
5-2)
Should be:
40 00.01.02 LIST OF CHAPTERS (TABLE OF CONTENTS)
9.50 99.08.02 429 SIGNAL LEVELS (TABLE 5-2)
Thanks! _________________ Blessed are those who can laugh at themselves for they shall never cease to be amused. |
|
| Back to top |
|
 |
Ozay
Joined: 23 Oct 2007 Posts: 15
|
Posted: Thu Apr 17, 2008 5:39 pm Post subject: |
|
|
Did I scare everyone off this thread? I tried to explain my problem as well as I could; do I need to clarify further?
This has been plaguing me for weeks and thought I would set it to the forums to see if I could get some ideas on how to run accomplish it.
Thanks! _________________ Blessed are those who can laugh at themselves for they shall never cease to be amused. |
|
| Back to top |
|
 |
Oberon
Joined: 18 Feb 2008 Posts: 456
|
Posted: Thu Apr 17, 2008 7:15 pm Post subject: |
|
|
| Ozay wrote: | | I have some data which has been OCR'd into text files. which looks like this:... | If your query is anything more than a short question about regex or pcre in AutoHotkey you should create a new thread. |
|
| Back to top |
|
 |
biatche
Joined: 23 Feb 2008 Posts: 59
|
Posted: Mon May 12, 2008 7:02 am Post subject: |
|
|
say i have this
testvar = random_key=random_settings hello=hi
and random would really mean it could be anything
I wish to use regexmatch to get random_key into a var, ok, it might not get it into a var, but i suppose i could at least get the position
FoundPos := RegExMatch("%testvar%", "^.*=")
Apparently this does not seem to work.
(btw, what command should i use to get random_key into a variable? StringLeft?) |
|
| Back to top |
|
 |
Wouther
Joined: 01 May 2007 Posts: 79 Location: The Netherlands
|
Posted: Mon May 12, 2008 10:29 am Post subject: |
|
|
You can save parts of a string with RegexMatch like this: | Code: | testvar = random_key=random_settings hello=hi
RegexMatch(testvar, "^(.*?)=", Sub)
MsgBox, random_key = %Sub1% |
_________________ Printing css/html-formatted text |
|
| Back to top |
|
 |
|