Parsing data from a line with multiple spaces

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
boiler
Posts: 16972
Joined: 21 Dec 2014, 02:44

Re: Parsing data from a line with multiple spaces

Post by boiler » 06 Apr 2021, 16:54

This puts the parsing and displaying of the data into functions, and it demonstrates it working with each of the three possible formats of the data:

Code: Select all

Text1 =
(

 Job Name                     Job Address                   Contact Name           Phone Number

 XXX-February2021-215         555 main ST                     Moe Howard             (555) 555-4444
 howl st-WIS

                              any city, wi 53005
)

Text2 =
(

 Job Name                     Job Address                   Contact Name           Phone Number

 XXX-February2021-215         555 main ST                     Moe Howard             (555) 555-4444
 howl st-WIS
 another job name line

                              any city, wi 53005
)

Text3 =
(

 Job Name                     Job Address                   Contact Name           Phone Number

 XXX-February2021-215         555 main ST                     Moe Howard             (555) 555-4444
 howl st-WIS
 another job name line
 and another

                              any city, wi 53005
)

JobData1 := GetJobData(Text1)
DisplayJobData(JobData1)

JobData2 := GetJobData(Text2)
DisplayJobData(JobData2)

JobData3 := GetJobData(Text3)
DisplayJobData(JobData3)

return

GetJobData(text) {
	lines := []
	lines := StrSplit(Trim(text, "`n`r`t "), "`n", "`r")
	for num, line in lines
	{
		if (num = 3) {
			RegExMatch(Trim(line), "(.*?) {2,}(.*?) {2,}(.*?) {2,}(.*)", m)
			jobName := m1
			jobAddress := m2
			contactName := m3
			phoneNumber := m4
		}
		if (lines.Count() = 7 && num = 4) || (lines.Count() = 8 && num = 4) || (lines.Count() = 8 && num = 5)
			jobName .= "`n" Trim(line)
		if (num = lines.MaxIndex() - 2)
			jobName .= "`n" Trim(line)
		if (num = lines.MaxIndex())
			jobAddress .= "`n" Trim(line)
	}
	return {JobName: jobName, JobAddress: jobAddress, ContactName: contactName, PhoneNumber: phoneNumber}
}

DisplayJobData(data) {
	MsgBox, % "Job Name:`n" data.JobName "`n`n"
			. "Job Address:`n" data.JobAddress "`n`n"
			. "Contact Name: " data.ContactName "`n`n"
			. "Phone Number: " data.PhoneNumber
}

Penguin
Posts: 94
Joined: 26 Feb 2016, 16:02

Re: Parsing data from a line with multiple spaces

Post by Penguin » 07 Apr 2021, 07:39

Thanks so much. I'm going to work with this and see how it does.

User avatar
boiler
Posts: 16972
Joined: 21 Dec 2014, 02:44

Re: Parsing data from a line with multiple spaces

Post by boiler » 07 Apr 2021, 10:39

No problem. If you find cases that it does not handle, you could post those and it should be a pretty quick modification to the code to handle them as well.

Penguin
Posts: 94
Joined: 26 Feb 2016, 16:02

Re: Parsing data from a line with multiple spaces

Post by Penguin » 14 Apr 2021, 14:56

I'm having an issue with this text. Little Acres - is part of the job name and it showing up in the job address
Text1 =
(

Job Name Job Address Contact Name Phone Number

BRN RAL little Acres - DNNPATCH LN No Name (919) 999-9999

Exception

WILLOW DDDG, NC 27592
)

User avatar
boiler
Posts: 16972
Joined: 21 Dec 2014, 02:44

Re: Parsing data from a line with multiple spaces

Post by boiler » 14 Apr 2021, 15:13

It's because there are multiple spaces within the job name itself. By quoting your post, I can see the original spacing, and I see there are 4 spaces in there. We can make the delineation that the separator between the name field and the address field has to be at least 5 like below, then it works. We could make it even more to be safe, but at some point we would be running up against the limit for the number of spaces that would be considered a separator. It's looking like it might be safe to change the 5 to something like 8, however. But it works like this with 5 in this case:

Code: Select all

Text1 =
(
 
Job Name                       Job Address              Contact Name                    Phone Number
 
BRN RAL    little Acres -      DNNPATCH LN              No Name                              (919) 999-9999
 
Exception
 
                               WILLOW DDDG, NC 27592
)
JobData1 := GetJobData(Text1)
DisplayJobData(JobData1)

GetJobData(text) {
	lines := []
	lines := StrSplit(Trim(text, "`n`r`t "), "`n", "`r")
	for num, line in lines
	{
		if (num = 3) {
			RegExMatch(Trim(line), "(.*?) {5,}(.*?) {2,}(.*?) {2,}(.*)", m)
			jobName := m1
			jobAddress := m2
			contactName := m3
			phoneNumber := m4
		}
		if (lines.Count() = 7 && num = 4) || (lines.Count() = 8 && num = 4) || (lines.Count() = 8 && num = 5)
			jobName .= "`n" Trim(line)
		if (num = lines.MaxIndex() - 2)
			jobName .= "`n" Trim(line)
		if (num = lines.MaxIndex())
			jobAddress .= "`n" Trim(line)
	}
	return {JobName: jobName, JobAddress: jobAddress, ContactName: contactName, PhoneNumber: phoneNumber}
}

DisplayJobData(data) {
	MsgBox, % "Job Name:`n" data.JobName "`n`n"
			. "Job Address:`n" data.JobAddress "`n`n"
			. "Contact Name: " data.ContactName "`n`n"
			. "Phone Number: " data.PhoneNumber
}

Penguin
Posts: 94
Joined: 26 Feb 2016, 16:02

Re: Parsing data from a line with multiple spaces

Post by Penguin » 14 Apr 2021, 15:18

Thank you, I'm testing this against a bunch of files, so I'll know if there are other issues.

Is there any way you can explain the regexmatch expression? I'm lost.

Penguin
Posts: 94
Joined: 26 Feb 2016, 16:02

Re: Parsing data from a line with multiple spaces

Post by Penguin » 14 Apr 2021, 15:54

I just found one that has 4 spaces between the first line of the job name and the job address. I'm wondering if we can go off the char position of the header for start and end of each section??

User avatar
boiler
Posts: 16972
Joined: 21 Dec 2014, 02:44

Re: Parsing data from a line with multiple spaces

Post by boiler » 14 Apr 2021, 16:01

Penguin wrote: Is there any way you can explain the regexmatch expression? I'm lost.
Sure. Before it evaluates each line, the Trim statement trims off any spaces from the beginning or end of the line, so we're starting and ending with visible characters. Then the line is broken down by the RegEx like this (I put quotes around them so that the leading space is shown in those that have one):

"(.*?)" - The . says to match any character. The * says to match it 0 or more times. The ? says to be "ungreedy" in applying the preceding wildcard, which we do because we want it to stop as soon as it's fulfilled in producing an overall match per the entire expression (i.e., we don't want it grabbing all the characters in the line). The parentheses around it identify this as our first "capturing subpattern", which will be our first field.

" {5,}" - The first character in this piece of the pattern is a space. The {5,} says to match the preceding only if it occurs 5 or more times (a second number after the comma would indicate a limit on the range, so {5,8} would mean match only it there are from 5 to 8 of the preceding character). This is how we determine that this is a delimiter (separator) between fields because there are at least 5 spaces (it used to be 2 or more, but we had to increase it to 5 because of the newest case you just posted).

"(.*?)" - This is the same as the first item in that it defines the match for our second field.

" {2,}" - This is the same as the second item in that it determines that there is a delimiter between the fields, which is two or more spaces in a row in this case.

"(.*?)" - Another field captured.

" {2,}" - Another delimiter.

"(.*)" - The last field captured.

User avatar
boiler
Posts: 16972
Joined: 21 Dec 2014, 02:44

Re: Parsing data from a line with multiple spaces

Post by boiler » 14 Apr 2021, 16:06

Penguin wrote: I just found one that has 4 spaces between the first line of the job name and the job address. I'm wondering if we can go off the char position of the header for start and end of each section??
I was going to do that, but it didn't seem like it would be consistent. For example, it doesn't look like the phone number always lines up beneath the header "Phone Number", and there may have been other cases like that. But perhaps they always start within a certain range. If you can determine the character position that each field will always start within, then it can be changed to parse them that way instead.

Penguin
Posts: 94
Joined: 26 Feb 2016, 16:02

Re: Parsing data from a line with multiple spaces

Post by Penguin » 15 Apr 2021, 07:55

I have looked at a bunch of my file examples and I think we should changed directions. It looks like the header row is consistent in every file I look at. If we find the start location of each header item, we could grab any text that is between the first header and the second even if the text doesn't start at the exact starting point of a header. I don't think we need to be concerned with the other lines of text, it's just this first line of data.

Also, thank you so much for taking the time to explain the regex code. I really need to brush up in this.



Text1 =
(

Job Name Job Address Contact Name Phone Number

BRN RAL little Acres - DNNPATCH LN No Name (919) 999-9999

)

User avatar
boiler
Posts: 16972
Joined: 21 Dec 2014, 02:44

Re: Parsing data from a line with multiple spaces

Post by boiler » 15 Apr 2021, 08:23

This should do it:

Code: Select all

Text1 =
(
 
Job Name                       Job Address              Contact Name                    Phone Number
 
BRN RAL    little Acres -      DNNPATCH LN              No Name                              (919) 999-9999

)
JobData1 := GetJobData(Text1)
DisplayJobData(JobData1)
return

GetJobData(text) {
	loop, parse, text, `n, `r
		if (Trim(A_LoopField))
			data .= A_LoopField "`n"
	line := StrSplit(data, "`n")
	Start1 := InStr(line.1, "Job Name")
	Start2 := InStr(line.1, "Job Address")
	Start3 := InStr(line.1, "Contact Name")
	Start4 := InStr(line.1, "Phone Number")
	jobName := Trim(SubStr(line.2, Start1, 	Start2 - Start1))
	jobAddress := Trim(SubStr(line.2, Start2, 	Start3 - Start2))
	contactName := Trim(SubStr(line.2, Start3, 	Start4 - Start3))
	phoneNumber := Trim(SubStr(line.2, Start4))
	return {JobName: jobName, JobAddress: jobAddress, ContactName: contactName, PhoneNumber: phoneNumber}
}

DisplayJobData(data) {
	MsgBox, % "Job Name:`n" data.JobName "`n`n"
			. "Job Address:`n" data.JobAddress "`n`n"
			. "Contact Name: " data.ContactName "`n`n"
			. "Phone Number: " data.PhoneNumber
}

Post Reply

Return to “Ask for Help (v1)”