| View previous topic :: View next topic |
| Author |
Message |
dwaynek
Joined: 20 Dec 2005 Posts: 37
|
Posted: Mon May 14, 2007 5:12 pm Post subject: extract an email address from a web page |
|
|
how would i find and then extract an email address from a web page
Last edited by dwaynek on Tue May 15, 2007 4:17 pm; edited 1 time in total |
|
| Back to top |
|
 |
Helpy Guest
|
Posted: Mon May 14, 2007 5:17 pm Post subject: |
|
|
| With regular expressions. I believe this has been shown already in this forum. |
|
| Back to top |
|
 |
Titan
Joined: 11 Aug 2004 Posts: 5068 Location: imaginationland
|
Posted: Mon May 14, 2007 7:40 pm Post subject: |
|
|
e.g.
| Code: | RegExMatch(text, "\b[\w\.]+@(?:\w+\.)*\w+", email)
MsgBox, %email% |
_________________
RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2") |
|
| Back to top |
|
 |
dwaynek
Joined: 20 Dec 2005 Posts: 37
|
Posted: Mon May 14, 2007 11:00 pm Post subject: |
|
|
| Titan wrote: | e.g.
| Code: | RegExMatch(text, "\b[\w\.]+@(?:\w+\.)*\w+", email)
MsgBox, %email% |
|
i tried this Titan but i got 2 weird results.
if i tried on a webpage which had graphics, the result was null.
if i tried this on text only, it resulted in an error msg saying the variable had an illegal character. |
|
| Back to top |
|
 |
engunneer
Joined: 30 Aug 2005 Posts: 6772 Location: Pacific Northwest, US
|
Posted: Tue May 15, 2007 12:16 am Post subject: |
|
|
you can't extract an email address from an image.
what error message did you get? (hint: hit Ctrl C on the error message window, and you can paste in the forum, using [quote](paste here)[/quote], or [code] [/code]) _________________
Unless otherwise noted, all code is untested.
Common Answers: 1.(Loops, Viruses, etc.) 2. Search 3.RTFM |
|
| Back to top |
|
 |
dwaynek
Joined: 20 Dec 2005 Posts: 37
|
Posted: Tue May 15, 2007 1:18 am Post subject: |
|
|
figured it out.
this was my script:
| Code: | #f4::
Send, ^c
RegExMatch(%clipboard%, "\b[\w\.]+@(?:\w+\.)*\w+", email)
MsgBox, %email%
return |
it didn't report an error on reload, but on execution.
apparently i should have used clipboard instead of %clipboard%
now it works fine.
thanks y'all! |
|
| Back to top |
|
 |
darchon Guest
|
Posted: Tue May 15, 2007 3:30 pm Post subject: |
|
|
i'm back.
well it seems that
\b[\w\.]+@(?:\w+\.)*\w+
is not enough to extract the email address i need. the email address i want to capture is in the format:
hous-326882907@craigslist.org
and for some reason, i only get: 326882907@craigslist.org
i tried: \b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
but that got nothing. |
|
| Back to top |
|
 |
dwaynek
Joined: 20 Dec 2005 Posts: 37
|
Posted: Tue May 15, 2007 3:32 pm Post subject: |
|
|
| sorry, that last msg (darchon) above is from me |
|
| Back to top |
|
 |
dwaynek
Joined: 20 Dec 2005 Posts: 37
|
Posted: Tue May 15, 2007 3:35 pm Post subject: |
|
|
ok i think i found a regex that works:
[\w-_.]+@(?:\w+(?::\d+)?\.){1,3}(?:\w+\.?){1,2} |
|
| Back to top |
|
 |
Grumpy Guest
|
Posted: Tue May 15, 2007 5:21 pm Post subject: |
|
|
That's the problem with e-mail addresses: many legal formats are often overlooked by simplistic regular expressions...
Somebody made a RE taking in account all possible formats, and the RE was 5000 chars long...
For example, "John Smith"@[151.45.44.1] is legal, I think. |
|
| Back to top |
|
 |
atnbueno
Joined: 24 Mar 2007 Posts: 26
|
|
| Back to top |
|
 |
|