AutoHotkey Community

It is currently May 27th, 2012, 3:44 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: July 30th, 2009, 12:29 pm 
Offline

Joined: March 16th, 2005, 10:33 pm
Posts: 969
Location: Frisia
I came across this article on validating email addresses, and decided to convert the code into AHK.

Here it is. Please test.

Code:
#SingleInstance force

/*

Valid Email RegEx?

http://www.pgregg.com/projects/php/code/showvalidemail.php
http://www.pgregg.com/projects/php/code/validate_email.inc.phps

*/

emailTest =
(LTrim % Join`n
name.lastname@domain.com|true
.@|false
a@b|false
@bar.com|false
@@bar.com|false
a@bar.com|true
aaa.com|false
aaa@.com|false
aaa@.123|false
aaa@[123.123.123.123]|true
aaa@[123.123.123.123]a|false
aaa@[123.123.123.333]|false
a@bar.com.|false
a@bar|false
a-b@bar.com|true
+@b.c|false
+@b.com|true
a@-b.com|false
a@b-.com|false
-@..com|false
-@a..com|false
a@b.co-foo.uk|true
"hello my name is"@stutter.com|true
"Test \"Fail\" Ing"@example.com|true
valid@special.museum|true
invalid@special.museum-|false
shaitan@my-domain.thisisminekthx|false
test@...........com|false
foobar@192.168.0.1|false
"Abc\@def"@example.com|true
"Fred Bloggs"@example.com|true
"Joe\\Blow"@example.com|true
"Abc@def"@example.com|true
customer/department=shipping@example.com|true
$A12345@example.com|true
!def!xyz%abc@example.com|true
_somename@example.com|true
Test \\'.chr(10).' Folding \\'.chr(10).' Whitespace@example.com|true
HM2Kinsists@(that comments are allowed)this.is.ok|true
user%uucp!path@somehost.edu|true
)


Loop, Parse, emailTest, `n
{
  StringSplit, emailTestArray, A_LoopField, |
  isit := isValidEmail(emailTestArray1)
  ;If (emailtestarray2 != resArray1) ; error
    MsgBox,, Testing,
    (LTrim
     Email: %emailTestArray1%
     
     Should be: %emailTestArray2%
     
     Is reported as: %isit%
    )
}

MsgBox Done testing

Return

isValidEmail(emailstr)
{
  ; Get length
  emailstr_len := StrLen(emailstr)
  ; Remove whitespace (AutoTrim)
  emailstr = %emailstr%
  ; Make lowercase
  StringLower, emailstr, emailstr
  ; Split it up into before and after the @ symbol
  StringGetPos, atPos, emailstr, @, R
  If ErrorLevel
    Return false ; no @
  StringLeft, local_part, emailstr, %atPos%
  StringRight, domain_part, emailstr, % emailstr_len - atPos - 1
  ; Sanitize quoted parts 
  local_part := RegExReplace(local_part, "\\\.", "_")
  local_part := RegExReplace(local_part, """[^""]+""", ".")
  ; Comments ( this is a comment ) are permitted in domain parts
  domain_part := RegExReplace(domain_part, "\([^()]*\)", "")
  ; Make sure there are no more @ (we sanitized valid ones above)
  If InStr(local_part, "@")
    Return false ; too many @
  ; Check that the username is >= 1 char
  If StrLen(local_part) = 0
    Return false ; username missing
  ; Split the domain part into the dotted parts
  StringSplit, domain_components, domain_part, `.
  ; Check there are at least 2
  If domain_components0 < 2
    Return false ; not enough domain components
  ; Check each domain part to ensure it doesn't start or end with a bad char
  Loop %domain_components0%
  {
    domain_component := domain_components%A_Index%
    If (StrLen(domain_component) > 0)
    {
      StringLeft, firstChar, domain_component, 1
      StringRight, lastChar, domain_component, 1
      If RegExMatch(firstChar, "[\.-]") Or RegExMatch(lastChar, "[\.-]")
        Return false ; wrong start/end character in domain component
    }
    Else
      Return false ; domain component missing
  }
  ; Check the last domain component has 2-6 chars (.uk to .museum)
  domain_last := domain_components%domain_components0%
  If (StrLen(domain_last) < 2) Or If (StrLen(domain_last) > 6)
    Return false ; TLD too large or small
  ; Check for valid chars - Domains can only have A-Z, 0-9, ., and the - chars,
  ; or be in the form [123.123.123.123]
  If RegExMatch(domain_part, "^\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]$")
  {
    If ip2long(domain_part) != 0
      Return true ; ip
    Else
      Return false ; ip error
  }
  If RegExMatch(domain_part, "^[a-z0-9\.-]+$")
    Return true ; domain
  ; If we get here then it didn't pass
  Return false ; end of function
}

ip2long(ip)
{  ; http://www.cflib.org/udf/ip2long
  ip := RegExReplace(ip, "\[|\]", "") ; remove xtra chars
  StringSplit, iparr, ip, `.
  If (iparr0 != 4)
    Return False
  If (iparr1 > 255 Or iparr2 > 255 Or iparr3 > 255 Or iparr4 > 255)
    Return False
  Else
    Return (iparr1*256^3) + (iparr2*256^2) + (iparr3*256) + iparr4
}


Edit: Clarified demo a bit more (I hope), and some changes.

_________________
Image mirror 1mirror 2mirror 3ahk4.me • PM or Image


Last edited by daonlyfreez on July 30th, 2009, 7:55 pm, edited 2 times in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 30th, 2009, 4:46 pm 
Offline

Joined: May 2nd, 2006, 11:16 pm
Posts: 800
Location: Greeley, CO
Sample Usage?

All I get is "Done testing".

_________________
Image
SoggyDog
Dwarf Fortress:
"The most intriguing game I've ever played."


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 30th, 2009, 7:29 pm 
That means the testing went well. If no other MessageBox shows up before that one. The idea is to change the input.

:wink:


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: July 30th, 2009, 7:55 pm 
Offline

Joined: March 16th, 2005, 10:33 pm
Posts: 969
Location: Frisia
I changed the demo a bit. You'll now see a messagebox on every check.

_________________
Image mirror 1mirror 2mirror 3ahk4.me • PM or Image


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: July 30th, 2009, 8:45 pm 
Offline

Joined: May 2nd, 2006, 11:16 pm
Posts: 800
Location: Greeley, CO
I get it now;
Just didn't spend enough time with it earlier.

Thanks.

_________________
Image
SoggyDog
Dwarf Fortress:
"The most intriguing game I've ever played."


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: October 10th, 2009, 10:14 pm 
I just found out that the RegEx by arpad3, named on the site, also works.

So, here is an alternative function.

The regex line should be one line (wordwrap!):

Quote:
static regex := "is) ... "


Code:
isValidEmail(emailstr)
{
/* THIS NEEDS TO BE UNCOMMENTED, AND TRANSFORMED INTO ONE LINE!!!
static regex := "is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)
[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-
z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|
2[0-4]\d|25[0-4])\]) *\z"
*/
If RegExMatch(emailstr, regex)
  Return true
Else
  Return false
}


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: October 10th, 2009, 10:35 pm 
Offline
User avatar

Joined: May 5th, 2007, 7:24 pm
Posts: 1240
Location: Seville, Spain
n-l-i-d wrote:
(code snip)


It can be shortened to this working snippet of code:
Code:
isValidEmail(emailstr){
    static regex := "is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
    . "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
    . "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
    . "2[0-4]\d|25[0-4])\]) *\z"
    return RegExMatch(emailstr, regex) != 0
}

_________________
fincs
Highly recommended: AutoHotkey_L (see why) (all my code snippets require it)
Formal request to polyethene - I support the unity of the AutoHotkey community
Get SciTE4AutoHotkey v3.0.00 (Release Candidate)
[My project list]


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: October 10th, 2009, 10:41 pm 
Duh. Ok, I thought you couldn't concatenate that way when initializing variables in functions, but I'm wrong again. :)

I found another source of even more thoughts on perfecting this: RFC-compliant email address validator. There are also more testcases there.

So if you really need to be sure about your email-addresses, don't want to false-positive/negative any, the function could/should be tested and perfected more.


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: October 10th, 2009, 10:56 pm 
Your version didn't work :?

I needed to separately init the regex. I don't know if this still has the advantage of loading the variable only once.

Code:
isValidEmail(emailstr){
  static regex
  regex := "is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
    . "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
    . "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
    . "2[0-4]\d|25[0-4])\]) *\z"
  return RegExMatch(emailstr, regex) != 0
}


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: October 11th, 2009, 4:30 pm 
Offline
User avatar

Joined: May 5th, 2007, 7:24 pm
Posts: 1240
Location: Seville, Spain
n-l-i-d wrote:
Your version didn't work :?

I needed to separately init the regex. I don't know if this still has the advantage of loading the variable only once.


No, it's a typo (quotes need to be escaped via "" inside strings). Corrected version:

Code:
isValidEmail(emailstr){
  static regex := "is)^(?:""""(?:\\\\.|[^""""])*""""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
    . "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
    . "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
    . "2[0-4]\d|25[0-4])\]) *\z"
  return RegExMatch(emailstr, regex) != 0
}

_________________
fincs
Highly recommended: AutoHotkey_L (see why) (all my code snippets require it)
Formal request to polyethene - I support the unity of the AutoHotkey community
Get SciTE4AutoHotkey v3.0.00 (Release Candidate)
[My project list]


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: October 11th, 2009, 5:19 pm 
Sorry, but that is not the case. I already escaped the quotes, you escape them again...

If I add a MsgBox to show me the regex, I get this:

Code:
is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)" . "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-" . "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|" . "2[0-4]\d|25[0-4])\]) *\z


Code:
isValidEmail("someone@somewhere.com")

isValidEmail(emailstr){
  static regex := "is)^(?:""""(?:\\\\.|[^""""])*""""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
    . "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
    . "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
    . "2[0-4]\d|25[0-4])\]) *\z"
    msgbox % regex
  return RegExMatch(emailstr, regex) != 0
}


Only if I don't use the concatenation on initializing the variable, it works:

Code:
is)^(?:"(?:\\\\.|[^"])*"|[^@]+)@(?=[^()]*(?:\([^)]*\)[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|2[0-4]\d|25[0-4])\]) *\z


Code:
isValidEmail("someone@somewhere.com")

isValidEmail(emailstr){
  static regex
  regex := "is)^(?:""(?:\\\\.|[^""])*""|[^@]+)@(?=[^()]*(?:\([^)]*\)"
    . "[^()]*)*\z)(?![^ ]* (?=[^)]+(?:\(|\z)))(?:(?:[a-z\d() ]+(?:[a-z\d() -]*[()a-"
    . "z\d])?\.)+[a-z\d]{2,6}|\[(?:(?:1?\d\d?|2[0-4]\d|25[0-4])\.){3}(?:1?\d\d?|"
    . "2[0-4]\d|25[0-4])\]) *\z"
    msgbox % regex
  return RegExMatch(emailstr, regex) != 0
}


So, I'm quite positive now that using concatenation while initializing variables in functions, does not work :?


Report this post
Top
  
Reply with quote  
PostPosted: July 26th, 2010, 1:27 am 
daonlyfreez wrote:
I came across this article on validating email addresses, and decided to convert the code into AHK.
...


Cool - I just came across this.

For some of the commenters, you may have missed the point of my *article
(* not really an article - I just wanted a methodical way of testing regexs that people insisted were great for email address validation).

Anyway, my point is not that you should strive for that holy-grail of a regex, but that they all have failings and to even get close to a working regex you end up with something so complex that you can never maintain it going forward.
Point of note: the arpad3 regex was written by arpad specifically to pass my tests - not to be a email address validator.

My conclusion is that, for this specific purpose, you should use a clearly documented step-by-step function to validate the email "rules". You won't thank me today, but you will in 5 years time when ICANN allows people to register their own TLDs for $500,000.

Regards,
PG


Report this post
Top
  
Reply with quote  
PostPosted: July 26th, 2010, 5:11 pm 
Offline

Joined: March 16th, 2005, 10:33 pm
Posts: 969
Location: Frisia
PaulGregg wrote:
daonlyfreez wrote:
I came across this article on validating email addresses, and decided to convert the code into AHK.
...


Cool - I just came across this.

For some of the commenters, you may have missed the point of my *article
(* not really an article - I just wanted a methodical way of testing regexs that people insisted were great for email address validation).

Anyway, my point is not that you should strive for that holy-grail of a regex, but that they all have failings and to even get close to a working regex you end up with something so complex that you can never maintain it going forward.
Point of note: the arpad3 regex was written by arpad specifically to pass my tests - not to be a email address validator.

My conclusion is that, for this specific purpose, you should use a clearly documented step-by-step function to validate the email "rules". You won't thank me today, but you will in 5 years time when ICANN allows people to register their own TLDs for $500,000.

Regards,
PG


Hi Paul,

Cool that you respond. 8)

Good to know that the arpad3 regex is not meant to be used that way. I thought I had found the 'shortest' regex version, but I guess that doesn't count.

I almost literally transcoded your original into AutoHotkey. Are there any specific parts that might need to be altered (apart from the future "private" TLDs)?

Greetings,

daonlyfreez

_________________
Image mirror 1mirror 2mirror 3ahk4.me • PM or Image


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: xXDarknessXx and 19 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group