hello!
let's say we have a document like this:
<!-- m -->http://www.text-uplo...63943&c=1058001<!-- m -->
and this text template is filled with various text in various constant fields
how can I compare that template to the other filled documents of this template with RegEx?
I mean, I need to store this as a multiline variable, but to be easy to maintain
any hint?
thanks!
multiline regex
Started by
azure
, May 11 2012 08:11 AM
9 replies to this topic
#1
Posted 11 May 2012 - 08:11 AM
#2
HintProvider
Posted 11 May 2012 - 11:48 AM
hello!
let's say we have a document like this:
http://www.crocko.co... ... mplate.txt
.....
any hint?
hint
You are requesting: template.txt (0.0 MB)
Requested file is deleted.
Hope this helps!
#3
Posted 11 May 2012 - 12:22 PM
link updated
#4
Posted 11 May 2012 - 12:55 PM
Why don't you just compare each line one by one ?
There is no point using multiline in this case as each var is on one single line.
Using a regex to compare the whole doc is nearly impossible as you must put some "check point" .
Example : point 6. "a period of ____ years,"
To get the content of the ____ you must use a regex like :
Imagine if you must do that for each var... you'll have a ****ing long regex !
It's really easier to sore each line in an object/array and then compare each line of the 2 objects (one for the original one for the text to compare with).
And with this way you can even directly get the lines where the differences are ! ;-)
There is no point using multiline in this case as each var is on one single line.
Using a regex to compare the whole doc is nearly impossible as you must put some "check point" .
Example : point 6. "a period of ____ years,"
To get the content of the ____ you must use a regex like :
RegExMatch(text, "a period of (\w+) years", var)
Imagine if you must do that for each var... you'll have a ****ing long regex !
It's really easier to sore each line in an object/array and then compare each line of the 2 objects (one for the original one for the text to compare with).
And with this way you can even directly get the lines where the differences are ! ;-)
#5
Posted 11 May 2012 - 08:55 PM
Why don't you just compare each line one by one ?
this was my first thought, but it's so hard to maintain! to actually write the scripts! because I have more than 10 templates!
#6
Gogo
Posted 12 May 2012 - 07:20 AM
Template:= " ( THIS PARTNERSHIP AGREEMENT is made this __________ day of ___________, 20__, by and between the following individuals: ___________________________ Address: __________________________ City/State/ZIP:______________________ etc... )" Text:= " ( THIS PARTNERSHIP AGREEMENT is made this 12-th day of May, 2012, by and between the following individuals: __ John Doe __ Address: __________________________ City/State/ZIP:______________________ etc... )" Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E" Sort Needle, D| C U MultiLineData:= RegExReplace(text,Needle,"`n") MsgBox % MultiLineData
#7
Posted 16 May 2012 - 01:36 PM
I am very interested to this, can you explain me please what you do here:
basically, in a specific point of the template text, there can be a specific text (specific characters or null)
what I find confusing is that both the empty template and the filled template might contain characters that need to be escaped
how do I overcome this? or there is no need to?
Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E" Sort Needle, D| C U MultiLineData:= RegExReplace(text,Needle,"`n") MsgBox % MultiLineData
basically, in a specific point of the template text, there can be a specific text (specific characters or null)
what I find confusing is that both the empty template and the filled template might contain characters that need to be escaped
how do I overcome this? or there is no need to?
#8
Posted 17 May 2012 - 11:43 AM
anyone????
#9
Gogo
Posted 17 May 2012 - 07:03 PM
The line:
Sorting is needed to ensure that RegEx will try to match the biggest parts first and will remove as big as possible part of the filled form.
In brief - This approach has the next limitations:
1. The template can not contain "\E"
2. The filled fields can not contain an entire template part ( in our case " day of " or ", 20" )
Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E"uses consecutive underlines ("_+") as delimiters to split the template in constant parts. These template parts are enclosed in escape sequences \Q ... \E. This way the template parts when are used in RegEx needle are accepted literally (there is no need to escape their characters). The template parts are concatenate again by a pipe "|" to form a needle like this:
"\Qtemplate_part1\E|\Qtemplate_part2\E|\Qtemplate_part3\E"This needle is used to match and remove all constant parts of the template.
Sorting is needed to ensure that RegEx will try to match the biggest parts first and will remove as big as possible part of the filled form.
In brief - This approach has the next limitations:
1. The template can not contain "\E"
2. The filled fields can not contain an entire template part ( in our case " day of " or ", 20" )
#10
Posted 18 May 2012 - 11:14 AM
thanks for the analysis
the limitations are crucial, however this approach is a good initial, thanks
the limitations are crucial, however this approach is a good initial, thanks




