 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
daonlyfreez
Joined: 16 Mar 2005 Posts: 949 Location: Berlin
|
Posted: Thu Jul 30, 2009 11:29 am Post subject: Email validity check - isValidEmail() [function] |
|
|
I came across this article on validating email addresses, and decided to convert the code into AHK.
Here it is. Please test.
| Code: | #SingleInstance force
/*
Valid Email RegEx?
http://www.pgregg.com/projects/php/code/showvalidemail.php
http://www.pgregg.com/projects/php/code/validate_email.inc.phps
*/
emailTest =
(LTrim % Join`n
name.lastname@domain.com|true
.@|false
a@b|false
@bar.com|false
@@bar.com|false
a@bar.com|true
aaa.com|false
aaa@.com|false
aaa@.123|false
aaa@[123.123.123.123]|true
aaa@[123.123.123.123]a|false
aaa@[123.123.123.333]|false
a@bar.com.|false
a@bar|false
a-b@bar.com|true
+@b.c|false
+@b.com|true
a@-b.com|false
a@b-.com|false
-@..com|false
-@a..com|false
a@b.co-foo.uk|true
"hello my name is"@stutter.com|true
"Test \"Fail\" Ing"@example.com|true
valid@special.museum|true
invalid@special.museum-|false
shaitan@my-domain.thisisminekthx|false
test@...........com|false
foobar@192.168.0.1|false
"Abc\@def"@example.com|true
"Fred Bloggs"@example.com|true
"Joe\\Blow"@example.com|true
"Abc@def"@example.com|true
customer/department=shipping@example.com|true
$A12345@example.com|true
!def!xyz%abc@example.com|true
_somename@example.com|true
Test \\'.chr(10).' Folding \\'.chr(10).' Whitespace@example.com|true
HM2Kinsists@(that comments are allowed)this.is.ok|true
user%uucp!path@somehost.edu|true
)
Loop, Parse, emailTest, `n
{
StringSplit, emailTestArray, A_LoopField, |
isit := isValidEmail(emailTestArray1)
;If (emailtestarray2 != resArray1) ; error
MsgBox,, Testing,
(LTrim
Email: %emailTestArray1%
Should be: %emailTestArray2%
Is reported as: %isit%
)
}
MsgBox Done testing
Return
isValidEmail(emailstr)
{
; Get length
emailstr_len := StrLen(emailstr)
; Remove whitespace (AutoTrim)
emailstr = %emailstr%
; Make lowercase
StringLower, emailstr, emailstr
; Split it up into before and after the @ symbol
StringGetPos, atPos, emailstr, @, R
If ErrorLevel
Return false ; no @
StringLeft, local_part, emailstr, %atPos%
StringRight, domain_part, emailstr, % emailstr_len - atPos - 1
; Sanitize quoted parts
local_part := RegExReplace(local_part, "\\\.", "_")
local_part := RegExReplace(local_part, """[^""]+""", ".")
; Comments ( this is a comment ) are permitted in domain parts
domain_part := RegExReplace(domain_part, "\([^()]*\)", "")
; Make sure there are no more @ (we sanitized valid ones above)
If InStr(local_part, "@")
Return false ; too many @
; Check that the username is >= 1 char
If StrLen(local_part) = 0
Return false ; username missing
; Split the domain part into the dotted parts
StringSplit, domain_components, domain_part, `.
; Check there are at least 2
If domain_components0 < 2
Return false ; not enough domain components
; Check each domain part to ensure it doesn't start or end with a bad char
Loop %domain_components0%
{
domain_component := domain_components%A_Index%
If (StrLen(domain_component) > 0)
{
StringLeft, firstChar, domain_component, 1
StringRight, lastChar, domain_component, 1
If RegExMatch(firstChar, "[\.-]") Or RegExMatch(lastChar, "[\.-]")
Return false ; wrong start/end character in domain component
}
Else
Return false ; domain component missing
}
; Check the last domain component has 2-6 chars (.uk to .museum)
domain_last := domain_components%domain_components0%
If (StrLen(domain_last) < 2) Or If (StrLen(domain_last) > 6)
Return false ; TLD too large or small
; Check for valid chars - Domains can only have A-Z, 0-9, ., and the - chars,
; or be in the form [123.123.123.123]
If RegExMatch(domain_part, "^\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]$")
{
If ip2long(domain_part) != 0
Return true ; ip
Else
Return false ; ip error
}
If RegExMatch(domain_part, "^[a-z0-9\.-]+$")
Return true ; domain
; If we get here then it didn't pass
Return false ; end of function
}
ip2long(ip)
{ ; http://www.cflib.org/udf/ip2long
ip := RegExReplace(ip, "\[|\]", "") ; remove xtra chars
StringSplit, iparr, ip, `.
If (iparr0 != 4)
Return False
If (iparr1 > 255 Or iparr2 > 255 Or iparr3 > 255 Or iparr4 > 255)
Return False
Else
Return (iparr1*256^3) + (iparr2*256^2) + (iparr3*256) + iparr4
}
|
Edit: Clarified demo a bit more (I hope), and some changes. _________________
mirror 1 • mirror 2 • mirror 3 • ahk4.me • PM or 
Last edited by daonlyfreez on Thu Jul 30, 2009 6:55 pm; edited 2 times in total |
|
| Back to top |
|
 |
SoggyDog
Joined: 02 May 2006 Posts: 783 Location: Greeley, CO
|
|
| Back to top |
|
 |
n-l-i-d Guest
|
Posted: Thu Jul 30, 2009 6:29 pm Post subject: |
|
|
That means the testing went well. If no other MessageBox shows up before that one. The idea is to change the input.
 |
|
| Back to top |
|
 |
daonlyfreez
Joined: 16 Mar 2005 Posts: 949 Location: Berlin
|
Posted: Thu Jul 30, 2009 6:55 pm Post subject: |
|
|
I changed the demo a bit. You'll now see a messagebox on every check. _________________
mirror 1 • mirror 2 • mirror 3 • ahk4.me • PM or  |
|
| Back to top |
|
 |
SoggyDog
Joined: 02 May 2006 Posts: 783 Location: Greeley, CO
|
|
| Back to top |
|
 |
n-l-i-d Guest
|
Posted: Sat Oct 10, 2009 9:14 pm Post subject: |
|
|
I just found out that the RegEx by arpad3, named on the site, also works.
So, here is an alternative function.
The regex line should be one line (wordwrap!):
| Quote: | | static regex := "is) ... " |
| Code: | isValidEmail(emailstr)
{
/* THIS NEEDS TO BE UNCOMMENTED, AND TRANSFORMED INTO ONE LINE!!!
static regex := "is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)
[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-
z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|
2[0-4]\d|25[0-4])\]) *\z"
*/
If RegExMatch(emailstr, regex)
Return true
Else
Return false
} |
|
|
| Back to top |
|
 |
fincs
Joined: 05 May 2007 Posts: 1162 Location: Seville, Spain
|
Posted: Sat Oct 10, 2009 9:35 pm Post subject: |
|
|
| n-l-i-d wrote: | (code snip)
|
It can be shortened to this working snippet of code:
| Code: | isValidEmail(emailstr){
static regex := "is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
. "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
. "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
. "2[0-4]\d|25[0-4])\]) *\z"
return RegExMatch(emailstr, regex) != 0
} |
_________________ fincs
Get SciTE4AutoHotkey v3.0.00 (Release Candidate)
[My project list] |
|
| Back to top |
|
 |
n-l-i-d Guest
|
Posted: Sat Oct 10, 2009 9:41 pm Post subject: |
|
|
Duh. Ok, I thought you couldn't concatenate that way when initializing variables in functions, but I'm wrong again.
I found another source of even more thoughts on perfecting this: RFC-compliant email address validator. There are also more testcases there.
So if you really need to be sure about your email-addresses, don't want to false-positive/negative any, the function could/should be tested and perfected more. |
|
| Back to top |
|
 |
n-l-i-d Guest
|
Posted: Sat Oct 10, 2009 9:56 pm Post subject: |
|
|
Your version didn't work
I needed to separately init the regex. I don't know if this still has the advantage of loading the variable only once.
| Code: |
isValidEmail(emailstr){
static regex
regex := "is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
. "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
. "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
. "2[0-4]\d|25[0-4])\]) *\z"
return RegExMatch(emailstr, regex) != 0
} |
|
|
| Back to top |
|
 |
fincs
Joined: 05 May 2007 Posts: 1162 Location: Seville, Spain
|
Posted: Sun Oct 11, 2009 3:30 pm Post subject: |
|
|
| n-l-i-d wrote: | Your version didn't work
I needed to separately init the regex. I don't know if this still has the advantage of loading the variable only once. |
No, it's a typo (quotes need to be escaped via "" inside strings). Corrected version:
| Code: |
isValidEmail(emailstr){
static regex := "is)^(?:""""(?:\\\\.|[^""""])*""""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
. "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
. "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
. "2[0-4]\d|25[0-4])\]) *\z"
return RegExMatch(emailstr, regex) != 0
} |
_________________ fincs
Get SciTE4AutoHotkey v3.0.00 (Release Candidate)
[My project list] |
|
| Back to top |
|
 |
n-l-i-d Guest
|
Posted: Sun Oct 11, 2009 4:19 pm Post subject: |
|
|
Sorry, but that is not the case. I already escaped the quotes, you escape them again...
If I add a MsgBox to show me the regex, I get this:
| Code: | is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)" . "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-" . "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|" . "2[0-4]\d|25[0-4])\]) *\z
|
| Code: | isValidEmail("someone@somewhere.com")
isValidEmail(emailstr){
static regex := "is)^(?:""""(?:\\\\.|[^""""])*""""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
. "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
. "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
. "2[0-4]\d|25[0-4])\]) *\z"
msgbox % regex
return RegExMatch(emailstr, regex) != 0
} |
Only if I don't use the concatenation on initializing the variable, it works:
| Code: | is)^(?:"(?:\\\\.|[^"])*"|[^@]+)@(?=[^()]*(?:\([^)]*\)[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|2[0-4]\d|25[0-4])\]) *\z
|
| Code: | isValidEmail("someone@somewhere.com")
isValidEmail(emailstr){
static regex
regex := "is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
. "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
. "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
. "2[0-4]\d|25[0-4])\]) *\z"
msgbox % regex
return RegExMatch(emailstr, regex) != 0
} |
So, I'm quite positive now that using concatenation while initializing variables in functions, does not work  |
|
| Back to top |
|
 |
PaulGregg Guest
|
Posted: Mon Jul 26, 2010 12:27 am Post subject: Re: Email validity check - isValidEmail() [function] |
|
|
Cool - I just came across this.
For some of the commenters, you may have missed the point of my *article
(* not really an article - I just wanted a methodical way of testing regexs that people insisted were great for email address validation).
Anyway, my point is not that you should strive for that holy-grail of a regex, but that they all have failings and to even get close to a working regex you end up with something so complex that you can never maintain it going forward.
Point of note: the arpad3 regex was written by arpad specifically to pass my tests - not to be a email address validator.
My conclusion is that, for this specific purpose, you should use a clearly documented step-by-step function to validate the email "rules". You won't thank me today, but you will in 5 years time when ICANN allows people to register their own TLDs for $500,000.
Regards,
PG |
|
| Back to top |
|
 |
daonlyfreez
Joined: 16 Mar 2005 Posts: 949 Location: Berlin
|
Posted: Mon Jul 26, 2010 4:11 pm Post subject: Re: Email validity check - isValidEmail() [function] |
|
|
| PaulGregg wrote: |
Cool - I just came across this.
For some of the commenters, you may have missed the point of my *article
(* not really an article - I just wanted a methodical way of testing regexs that people insisted were great for email address validation).
Anyway, my point is not that you should strive for that holy-grail of a regex, but that they all have failings and to even get close to a working regex you end up with something so complex that you can never maintain it going forward.
Point of note: the arpad3 regex was written by arpad specifically to pass my tests - not to be a email address validator.
My conclusion is that, for this specific purpose, you should use a clearly documented step-by-step function to validate the email "rules". You won't thank me today, but you will in 5 years time when ICANN allows people to register their own TLDs for $500,000.
Regards,
PG |
Hi Paul,
Cool that you respond.
Good to know that the arpad3 regex is not meant to be used that way. I thought I had found the 'shortest' regex version, but I guess that doesn't count.
I almost literally transcoded your original into AutoHotkey. Are there any specific parts that might need to be altered (apart from the future "private" TLDs)?
Greetings,
daonlyfreez _________________
mirror 1 • mirror 2 • mirror 3 • ahk4.me • PM or  |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|