First, forgive me but I don't know the best way to ask my question, so were going to go on a roller coaster ride together.
I am working on a project I have mostly done; I am pulling text data from a PDF. No problems with that in general. My current process is to copy the entire PDF, then using RegEx I pull pertinent data. Due to an NDA, I cant show official examples, but the concept is like a group of invoices in a single PDF. With this in mind, my process is regex to find invoice number for invoice 1, I then set a variable of the location, then on the next loop it uses the same regex to find the next invoice number in order. Code example (this is within a loop):
Code: Select all
POSITION1 := RegExMatch(PDF_DATA,".*Booking Number: (.*)\RDesc:",BOOKING_NUMBER, POSITION1 + StrLen(BOOKING_NUMBER))
if (BOOKING_NUMBER = "")
RegExMatch(PDF_DATA,".*Booking Number: (.*)\RDesc:",BOOKING_NUMBER)
else
RegExMatch(PDF_DATA,".*Booking Number: (.*)\RDesc:",BOOKING_NUMBER, POSITION1 + StrLen(BOOKING_NUMBER))
My issue stems from a specific invoice format where there are headers with normal data as per the norm, but then there are tables in between that have data I need to pull. I can make a loop to pull from the table, but my issue comes from how do I break that loop when it finishes that particular table (the data on the next page will be in the same format, so the regex would still catch on it, but its for a different order).
VERY basic examples would be something like this-
Normal invoice - current process works fine with this:
[page 1]
Invoice number: variable1
Booking number: variable2
Material used: variable 3
[page 2]
Invoice number: variable1
Booking number: variable2
Material used: variable 3
New invoice - table in between each data point - The subvariables would have the same invoice/booking number of that page
[page 1]
Invoice number: variable1
Booking number: variable2
Material for item 1: subvariable1
Material for item 2: subvariable2
Material for item 3: subvariable3
Material for item 4: subvariable4
[page 2]
Invoice number: variable1
Booking number: variable2
Material for item 1: subvariable1
Material for item 2: subvariable2
Material for item 3: subvariable3
Material for item 4: subvariable4
I know how to pair the invoice/booking number ot the subvariable, but what I dont know is if I have a loop to pull from the material tables, HOW do I break that loop? If I had some simple regex that pulls Material for item \d+: (.*), then it would catch everything from page 1 and page 2. Some of the invoices do have a line stating "end of booking number xxxx" (but not all invoices have this, so I dont know if it would the best option), though I dont know how to do a test to see if THAT is before the next loop catch.