Jump to content


Photo

Regexmatch Help, FileRead vs Internal Assignment


  • Please log in to reply
16 replies to this topic

#1 Rathgar2

Rathgar2
  • Members
  • 59 posts

Posted 22 May 2012 - 11:41 PM

First Off, Thanx much @RaptorX and @engunneer for their patience and insight on my previous thread.

I am trying to write a set of hotkeys to streamline and automate a Data Entry task. We keep customer records in a webbased CRM program called ONYX and we often need to send this data to another webform. This requires that we handkey all this data a second time, which is "High Tomfoolery" IMHO. So using my newly learned skills inside AHK_L (AutoHotkey110703_Install.exe) and Regexmatch I have written a script that should do the job. First I have it highlight and copy the customer record from ONYX and append it into a tasty txt file. Then from there I have it Parse and Regexmatch for different strings and store them to VARs for later outputting into a webform. I have got it up but am stymied in a few spots. I have attached the AHK file as well as a mocked up data source to play with.

Issue #1, My Customer's First and Last Name will always be located before this "Needle Value" "(ID:" that you see on the 1st line of the ONYXRaw1.txt. I cannot figure out how to make the script to "backtrack from that Needle Value 'anchor' to collect the names.

Issue #2, Sometimes the customer has 2 address lines and I cannot get my address parsing loop to work right. Since it works 100% correctly in another AHK script (doing a different job) I am a bit stumped what I have done wrong.

Issue #3, When getting the Telephone number parsed the sucker works right with I Manually set the values inside the script see Purple section on CTRL+SHIFT+0 (Zero) MACRO#2. But when I use a FileRead from the txt to load from the txt file it fails. See Red section.

;[color=#FF0080] CRTL+SHIFT+o (o as in Oscar)[/color] Start in Outlook Shared ROLE account for WGU Regs.  Opens Email, strips the empty spaces and white spaces, parses the data for student info, stores that info to VAR set, Overpastes info back into Outlook and saves it.
^+o::
KeyWait, o
Clipboard =
;WinActivate, ahk_id %Onyx1%
;Sleep 500
;MouseClick, Left, 154, 183, , ,D
;Sleep 250
;MouseClick, Left, 1381, 524, , ,U
;Sleep 500
;Send ^c  ; Highlight All and Copy
;ClipWait
;ONYXRaw := Clipboard
;FileDelete, ONYXRaw1.txt
;FileAppend, %ONYXRaw%, ONYXRaw1.txt ;create raw txt file with the ONYX Info
;Sleep 250
;MouseClick, Left, 660, 500
[color=#FF0080]FileRead, ONYXdata, ONYXRaw1.txt[/color]
; This section catches & stores the address first, it can tell if there are 2 or 3 address lines.
Loop, parse, ONYXdata, `n, `r
{
  if inStr(A_LoopField, "Home Address")
    nexlineIsAddress := 1
  else if (nexlineIsAddress)
  {
    address := A_LoopField
    nextlineMightBeAddress2 := 1
    address2 := ""
    nexlineIsAddress := 0
    continue
  }
  else if   RegExMatch(A_LoopField,"^(.*) ([A-Z]{2}), (\d{5})",_)
  {
    City :=   _1, State :=   _2, ZipCode :=   _3
    nextlineMightBeAddress2 := 0
  }
  else if (nextlineMightBeAddress2)
  {
    _address2 := A_LoopField
    nextlineMightBeAddress2 := 0
  }
}
; This next section catches & stores the First Name, Last Name, ONYX email address, Student ID#, and Telephone Number.
Loop, parse, ONYXdata, `n, `r
{
#name    := "Name:\s+?(?P<fName>\w+)\W+?(?P<lName>.*?)\r?\n"
#email   := "Email\s+?(?P<eMail>.*?)\r?\n"
#StudentID := "SSN\s+?(?P<StudentID>.*?)\r?\n"
[color=#FF0080]#phone    := "Home\s+\W?(?P<aCode>\d{3})\W+?(?P<ph1>\d{3})\W+?(?P<ph2>\d{4})\r?\n"[/color]
;#phone    := "Home ((?P<aCode>\d{3})\W+?(?P<ph1>\d{3})\W+?(?P<ph2>\d{4})\r?\n"

Regexmatch(ONYXdata, #name, _)
Regexmatch(ONYXdata, #email, _)
Regexmatch(ONYXdata, #StudentID, _)
[color=#FF0080]Regexmatch(ONYXdata, #phone, _)[/color]
}
msgbox % "Data Captured to Vars from ONYX `n"
       . "Name:`t`t`t" _fName "`n"
       . "Last Name:`t`t" _lName "`n"
[color=#FF0080]       . "Phone Number:`t`t(" _aCode ") " _ph1 "-" _ph2 "`n"[/color]
       . "Email Address:`t`t" _eMail "`n`n"
       . "Address Line 1:`t`t" address "`n"
       . "Address Line 2:`t`t" address2 "`n"
       . "City:`t`t" City "`n"
       . "State:`t`t" State "`n"
       . "Zip Code:`t`t" ZipCode "`n"
       . "Student ID#: `t" _StudentID "`n`n"

Return

[color=#8000BF]; CTRL+SHIFT+0 (Zero) Test Macro for Parsing Loop
^+0::
KeyWait, 0
ONYXData=
(
 Casey Allerman (ID:20442433 ~ D3 Mailer: 636049)DetailsExternal ContactsInternal ContactsOrganization ChartReportsStudent Profile   
 
Customer Details              
 

To search for a customer, click the search button. To add a new customer, click the add button.
Individual 
Email mgalloner913@gmailer.com 
Company Summit Hallsey High School 

Partner Information 
Partner Access 

Details 
Pref. Language ENG 
SSN 111-22-3344 
Sync with PSP  
 Telephone 
Business (501) 555-7127 
Home (501) 555-7222 

Home Address 
11998 Swirling Brook Ln 
TEST Apt 2x 
Frenchy, TX 70886-1745 
 
)[/color]
; This section catches & stores the address first, it can tell if there are 2 or 3 address lines.
Loop, parse, ONYXdata, `n, `r
{
  if inStr(A_LoopField, "Home Address")
    nexlineIsAddress := 1
  else if (nexlineIsAddress)
  {
    address := A_LoopField
    nextlineMightBeAddress2 := 1
    address2 := ""
    nexlineIsAddress := 0
    continue
  }
  else if   RegExMatch(A_LoopField,"^(.*) ([A-Z]{2}), (\d{5})",_)
  {
    City :=   _1, State :=   _2, ZipCode :=   _3
    nextlineMightBeAddress2 := 0
  }
  else if (nextlineMightBeAddress2)
  {
    _address2 := A_LoopField
    nextlineMightBeAddress2 := 0
  }
}
; This next section catches & stores the First Name, Last Name, ONYX email address, Student ID#, and Telephone Number.
Loop, parse, ONYXdata, `n, `r
{
#name    := "Name:\s+?(?P<fName>\w+)\W+?(?P<lName>.*?)\r?\n"
#email   := "Email\s+?(?P<eMail>.*?)\r?\n"
#StudentID := "SSN\s+?(?P<StudentID>.*?)\r?\n"
[color=#8000BF]#phone    := "Home\s+\W?(?P<aCode>\d{3})\W+?(?P<ph1>\d{3})\W+?(?P<ph2>\d{4})\r?\n"
[/color]
Regexmatch(ONYXdata, #name, _)
Regexmatch(ONYXdata, #email, _)
Regexmatch(ONYXdata, #StudentID, _)
[color=#8000BF]Regexmatch(ONYXdata, #phone, _)[/color]
}
msgbox % "Data Captured to Vars from ONYX `n"
       . "Name:`t`t`t" _fName "`n"
       . "Last Name:`t`t" _lName "`n"
[color=#8000BF]       . "Phone Number:`t`t(" _aCode ") " _ph1 "-" _ph2 "`n"[/color]
       . "Email Address:`t`t" _eMail "`n`n"
       . "Address Line 1:`t`t" address "`n"
       . "Address Line 2:`t`t" address2 "`n"
       . "City:`t`t" City "`n"
       . "State:`t`t" State "`n"
       . "Zip Code:`t`t" ZipCode "`n"
       . "Student ID#: `t" _StudentID "`n`n"

Return

; CRTL+SHIFT+o (o as in Oscar) MACRO#1
; CTRL+SHIFT+0 (Zero) Test Macro for Parsing Loop MACRO#2

Can anyone help me out here?

#2 0x150--ISO

0x150--ISO
  • Members
  • 657 posts

Posted 23 May 2012 - 01:23 AM

hstack =
(
 Casey Allerman (ID:20442433 ~ D3 Mailer: 636049)DetailsExternal ContactsInternal ContactsOrganization ChartReportsStudent Profile   
 
Customer Details              
 

To search for a customer, click the search button. To add a new customer, click the add button.
Individual 
Email mgalloner913@gmailer.com 
Company Summit Hallsey High School 

Partner Information 
Partner Access 

Details 
Pref. Language ENG 
SSN 111-22-3344 
Sync with PSP  
 Telephone 
Business (501) 555-7127 
Home (501) 555-7222 

Home Address 
11998 Swirling Brook Ln 
TEST Apt 2x 
Frenchy, TX 70886-1745 
 )

ndl := "s)^(.*\w+)\s(\w+)\s\(I.*?(\w+@\w+\.\w+).*Co\w+\s(.*?)`n.*?\.\s\w+\"
	 .  "s(\w+).*N\s([\d-]+).*?\s(?=\()(.*?)`n.*?\s(?=\()(.*?)`n.*s\s(.*)\s"

RegExMatch( hStack, ndl, m )

MsgBox %  "1st Name: " m1 "`nLast Name: " m2 "`n`nEmail: " m3
		 .  "`nCompany: " m4 "`n`nLanguage: " m5 "`nSSN: " m6
		 .  "`nBusiness#: " m7 "`nHome#: " m8 "`n`nAddress:`n" m9
Posted Image

#3 Rathgar2

Rathgar2
  • Members
  • 59 posts

Posted 23 May 2012 - 09:51 PM

Wowzer. That is some Efficient Code! I would dearly love to understand how it works! I'll study it and take it apart and post what I think you've done but that will take me awhile. My first Three questions on it are 1.does it matter what you've named the Vars hStack and ndl? 2.I do not see a Loop, Parse, and am not sure why, how is it skipped, does the ndl var already have it parsed?
My first test worked like your screenshot. I then removed the "TEST Apt 2x" from the hStack to test if it would cope with a 2 address line set, and that worked. Next I tested to see if a Hyphenated Last Name would trip it up and it does. When I change Casey Allerman to Casey Allerman-Testyer I get this:
Posted Image
No Data, So Sad. I wish I understood your code enough to figure out how to have it deal with that, but what edit would you make so that it could include a hyphenated Last Name? Also there will be some records that have a Mr. Ms. Mrs. Miss that needs to be excluded so that it gets the first name cleanly, can this also be programmed in? I also tested to see if I load the hStack via a FileRead from the text and that works like a charm! I'll need to figure out the array it produces so that my next Hotkey will put the zipcode in the ZipCode field abd I'll need to make certain how to get that value out of the Var m9. Thanx your code is rather clever, I feel my ignorance has grown over the last 20 minutes, hehehehe,

#4 0x150--ISO

0x150--ISO
  • Members
  • 657 posts

Posted 23 May 2012 - 11:16 PM

1.does it matter what you've named the Vars hStack and ndl?
2.I do not see a Loop, Parse, and am not sure why, how is it skipped, does the ndl var already have it parsed?

The variable names can be changed to anything. So if your passing whole sections of data to RegEx ( ie. your example is 1 whole section ),
Just make sure the RegExMatch haystack is the same name.

I'm not looping through the string. I'm using RegEx to go through the data 1 step, search and match at a time.
See below for explanation ( attempt anyway ;) ).

This needle fixes the hyphen issue and excludes titles with a period after ( eg. Mr. Mrs. etc ).
ndl := "s)(\w+)\W(\w+)\s\(.*?(\w+@\w+\.\w+).*Co\w+\s(.*?)`n.*?\.\s\w+\s("
	 .  "\w+).*N\s([\d-]+).*?\s(?=\()(.*?)`n.*?\s(?=\()(.*?)`n.*s\s(.*)\s"
I'm now starting to think all spaces should be replaced with non word characters \W for better reliability.
Let me know if you run into any issues.

Heres the breakdown:s) the DotAll. option is used so preceding character searches dont stop at newlines.
(\w+) match 1st name or 1st group then..
\W look for a non word character separating the 1st group then.. ( prob. should be \W+ incase of more than 1 non word )
(\w+) match 2nd name or 2nd group then..
\s\(.*? look for a space then a open bracket before the word 'ID' then search forward to..
(\w+@\w+\.\w+) the email address or 3rd group which has searched for a word with an @ symbol, another word, a period and one more word then..
.*Co\w+\s search forward until the 2 case insensitive letters Co in Company are found the rest of the word and a space then..
(.*?) match the Company or 4th group then..
`n.*?\.\s\w+\s newline and forward search until a period, space, word, space then..
(\w+) match the Language or 5th group then...
.*N\s search forward to the last N + space combination in SSN , then..
([\d-]+) match a class including digits and minus signs for the SSN# or 6th group then..
.*?\s search forward until a space then..
(?=\()(.*?)`n if theres a bracket before a forward search until a newline then thats the business phone number or 7th match then..
.*?\s search forward until a space then..
(?=\()(.*?)`n if theres a bracket before a forward search until a newline then thats the home phone number or 8th match then..
.*s\s search forward until a s + space combination then..
(.*)\s match all characters forward until space which is the address or 9th match..
**inhales, starts to breath again..
Hope that helped :lol:

#5 sinkfaze

sinkfaze
  • Moderators
  • 6089 posts

Posted 24 May 2012 - 05:32 AM

With all due respect to 0x150||ISO's example, it will probably be better to use a series of regex's to match the data fields and progressively work our way through the dataset:

[color=#00BF00]; assumes the file data was copied to the Clipboard[/color]
matches=	[color=#00BF00]; delimited list of tags and regex matches for the tags[/color]
(
First Name:@^\W+\K\S+
Last Name:@\s+\K.*(?=\(ID:)
Email:@Email \K\V+
Company:@Company \K\V+
Language:@Pref. Language \K\V+
SSN:@SSN \K\V+
Business#:@Business \K\V+
Home#:@Home \K\V+
Address:@s)Home Address\s+\K.*
)
StringSplit, match, matches, `n, `r	[color=#00BF00]; split each line of matches into their pairs[/color]
Pos=1	[color=#00BF00]; assign a starting position to progressively traverse the string[/color]
Loop, %	match0	[color=#00BF00]; loop through each pair[/color]
{
	StringSplit, part, match%A_Index%, @	[color=#00BF00]; split the pair[/color]
	if	Pos :=	RegExMatch(Clipboard,part2,m,Pos+StrLen(m))	[color=#00BF00]; if the regex matches, add to the var[/color]
		res .=	(!res ? "" : "`n") part1 (part1="Address:" ? "`n" : " ") m
}
MsgBox %	res
return


#6 0x150--ISO

0x150--ISO
  • Members
  • 657 posts

Posted 24 May 2012 - 05:36 AM

agreed!

#7 engunneer

engunneer
  • Fellows
  • 9162 posts

Posted 24 May 2012 - 01:32 PM

I would not use (\w+@\w+\.\w+) since I think it would fail to find a first.last@somehwhere.co.uk type address. I think you had a good email needle in your other thread, otherwise you can use the gigantic email regex that you can find elsewhere in the forum that works very well.

#8 0x150--ISO

0x150--ISO
  • Members
  • 657 posts

Posted 24 May 2012 - 04:34 PM

tnx @sinkfaze 4 \V non vertical whitespace char ;)

#9 Rathgar2

Rathgar2
  • Members
  • 59 posts

Posted 24 May 2012 - 04:52 PM

Youz Guyz ROCK! I am going to dive in on this now and see if I can get it up and working. I will watch for the email types that have a period in the 1st half and test that angle (good call engunneer) I sure hope I can pick up the How This Works, because so far it has been super-damn clever! I'll report back after I work through my other tasks.

#10 0x150--ISO

0x150--ISO
  • Members
  • 657 posts

Posted 24 May 2012 - 05:07 PM

@engunneer I'm thinking classes ([\w\W]+@[\w\W]+\.[\w\W]+) should cover it.
Based on the local section allowing ASCII characters RFC 5322 Section 3.2.3 and Unicode RFC 6531,
and the domain name section consisting of letters, digits, hyphens and dots.

#11 engunneer

engunneer
  • Fellows
  • 9162 posts

Posted 24 May 2012 - 05:11 PM

isn't [\w\W] the same as *?
sinkfaze's \K\V+ seems like a great trick since the format of the file is well known. You only need the fancier regexes to pull data out of oddly formed data.

(my work email address is of the form first.last@company-name.com, which is a handy test-case)

#12 0x150--ISO

0x150--ISO
  • Members
  • 657 posts

Posted 24 May 2012 - 05:31 PM

isn't [\w\W] the same as *?

hrmm tested that in place of the classes with no luck.
This works .*@.*\..* without testing as a group ;)

#13 Rathgar2

Rathgar2
  • Members
  • 59 posts

Posted 24 May 2012 - 11:43 PM

OK, well I am still doing something wrong, and I am sure it is me being boneheaded here but this is the ndl code now with the above edits and then below that is what I get:
; CTRL+SHIFT+0 (Zero) Test Macro for Parsing Loop
^+0::
KeyWait, 0
hStack=
(
 Miss Mr. Casey AllKennish (ID:20442433 ~ D3 Mailer: 636049)DetailsExternal ContactsInternal ContactsOrganization ChartReportsStudent Profile   
 
Customer Details              
 

To search for a customer, click the search button. To add a new customer, click the add button.
Individual 
Email test.mgalloner913@gmailer.com 
Company Summit Hallsey High School 

Partner Information 
Partner Access 

Details 
Pref. Language ENG 
SSN 111-22-3344 
Sync with PSP  
 Telephone 
Business (501) 555-7127 
Home (501) 555-7222 

Home Address 
11998 Swirling Brook Ln 
Frenchy, TX 70886-1745 
 
)
; This section catches & stores customer info.
ndl := "s)(\w+)\W(\w+)\s\(.*?([color=#0040FF].*@.*\..*[/color]).*Co\w+\s(.*?)`n.*?\.\s\w+\s("
    .  "\w+).*N\s([\d-]+).*?\s(?=\()(.*?)`n.*?\s(?=\()(.*?)`n.*s\s(.*)\s"

RegExMatch( hStack, ndl, m )

MsgBox %  "1st Name: " m1 "`nLast Name: " m2 "`n`nEmail: " m3
       .  "`nCompany: " m4 "`n`nLanguage: " m5 "`nSSN: " m6
       .  "`nBusiness#: " m7 "`nHome#: " m8 "`n`nAddress:`n" m9
       
Return
Posted Image
As you can see it is all FOOBAR now and I weep that I will ever understand why this does not work. Can you see what I have done wrong? I have not yet tried sinkfaze's approach and will do so next.

#14 0x150--ISO

0x150--ISO
  • Members
  • 657 posts

Posted 25 May 2012 - 03:05 AM

I have not yet tried sinkfaze's approach and will do so next.

I could fix the needle but I think sinkfaze's example adheres to the general rule of RegEx, do not use it to parse!
Plus it will be much easier for you to add your zip code.

#15 Rathgar2

Rathgar2
  • Members
  • 59 posts

Posted 25 May 2012 - 07:51 PM

Thanx sinkfaze I tried your code. I hotkeyed it and added a Fileread to load the Clipboard with my Raw Test Data
; CTRL+SHIFT+9 Test
^+9::
KeyWait, 9
; assumes the file data was copied to the Clipboard
[color=#BF0080]FileRead, Clipboard, ONYXRaw1.txt[/color]
matches=   ; delimited list of tags and regex matches for the tags
(
[color=#404000]BLAH BLAH BLAH CODE CODE CODE[/color]
}
MsgBox %   res
Return

and it seems to get everything, although sometimes I get a msg box with the results doubled and it looks like this:
Posted Image
I added the <!-- e --><a href="mailto:Test.1_mgalloner913@gmailer-inc.com">Test.1_mgalloner913@gmailer-inc.com</a><!-- e --> like this b/c of users penchant of using all manner of . _ & - in their emails and the code seems to cope with those just fine. It also seems to deal with Hyphenated Last Names and doesn't bat an eyelash, Que Bueno! I do not think this will matter as long as I can properly vomit out the stored Vars into the new form. Closing office early and will not get back to this again till Wednesday but I wanted to thank you for this approach. I will have to next figure out how to call on each Var (Like how to get the name from your First Name:@^\W+\K\S+ Code so I can deploy it to a webform. Also will need to figure out how it stores the City, State and Zip so those can be separated. This totally gets me ahead of the game and I thank you!