Posted 11 May 2012 - 08:11 AM
let's say we have a document like this:
<!-- m -->http://www.text-uplo...63943&c=1058001<!-- m -->
and this text template is filled with various text in various constant fields
how can I compare that template to the other filled documents of this template with RegEx?
I mean, I need to store this as a multiline variable, but to be easy to maintain
Posted 11 May 2012 - 12:55 PM
There is no point using multiline in this case as each var is on one single line.
Using a regex to compare the whole doc is nearly impossible as you must put some "check point" .
Example : point 6. "a period of ____ years,"
To get the content of the ____ you must use a regex like :
RegExMatch(text, "a period of (\w+) years", var)
Imagine if you must do that for each var... you'll have a ****ing long regex !
It's really easier to sore each line in an object/array and then compare each line of the 2 objects (one for the original one for the text to compare with).
And with this way you can even directly get the lines where the differences are ! ;-)
Posted 11 May 2012 - 08:55 PM
Why don't you just compare each line one by one ?
this was my first thought, but it's so hard to maintain! to actually write the scripts! because I have more than 10 templates!
Posted 12 May 2012 - 07:20 AM
Template:= " ( THIS PARTNERSHIP AGREEMENT is made this __________ day of ___________, 20__, by and between the following individuals: ___________________________ Address: __________________________ City/State/ZIP:______________________ etc... )" Text:= " ( THIS PARTNERSHIP AGREEMENT is made this 12-th day of May, 2012, by and between the following individuals: __ John Doe __ Address: __________________________ City/State/ZIP:______________________ etc... )" Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E" Sort Needle, D| C U MultiLineData:= RegExReplace(text,Needle,"`n") MsgBox % MultiLineData
Posted 16 May 2012 - 01:36 PM
Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E" Sort Needle, D| C U MultiLineData:= RegExReplace(text,Needle,"`n") MsgBox % MultiLineData
basically, in a specific point of the template text, there can be a specific text (specific characters or null)
what I find confusing is that both the empty template and the filled template might contain characters that need to be escaped
how do I overcome this? or there is no need to?
Posted 17 May 2012 - 07:03 PM
Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E"uses consecutive underlines ("_+") as delimiters to split the template in constant parts. These template parts are enclosed in escape sequences \Q ... \E. This way the template parts when are used in RegEx needle are accepted literally (there is no need to escape their characters). The template parts are concatenate again by a pipe "|" to form a needle like this:
"\Qtemplate_part1\E|\Qtemplate_part2\E|\Qtemplate_part3\E"This needle is used to match and remove all constant parts of the template.
Sorting is needed to ensure that RegEx will try to match the biggest parts first and will remove as big as possible part of the filled form.
In brief - This approach has the next limitations:
1. The template can not contain "\E"
2. The filled fields can not contain an entire template part ( in our case " day of " or ", 20" )
Posted 18 May 2012 - 11:14 AM
the limitations are crucial, however this approach is a good initial, thanks