Jump to content


Photo

multiline regex


  • Please log in to reply
9 replies to this topic

#1 azure

azure
  • Members
  • 1203 posts

Posted 11 May 2012 - 08:11 AM

hello!

let's say we have a document like this:
<!-- m -->http://www.text-uplo...63943&c=1058001<!-- m -->

and this text template is filled with various text in various constant fields

how can I compare that template to the other filled documents of this template with RegEx?

I mean, I need to store this as a multiline variable, but to be easy to maintain

any hint?

thanks!

#2 HintProvider

HintProvider
  • Guests

Posted 11 May 2012 - 11:48 AM

hello!

let's say we have a document like this:
http://www.crocko.co... ... mplate.txt

.....
any hint?


hint

You are requesting: template.txt (0.0 MB)
Requested file is deleted.


Hope this helps!

:D

#3 azure

azure
  • Members
  • 1203 posts

Posted 11 May 2012 - 12:22 PM

link updated

#4 CodeKiller

CodeKiller
  • Members
  • 2066 posts

Posted 11 May 2012 - 12:55 PM

Why don't you just compare each line one by one ?
There is no point using multiline in this case as each var is on one single line.
Using a regex to compare the whole doc is nearly impossible as you must put some "check point" .
Example : point 6. "a period of ____ years,"
To get the content of the ____ you must use a regex like :
RegExMatch(text, "a period of (\w+) years", var)

Imagine if you must do that for each var... you'll have a ****ing long regex !
It's really easier to sore each line in an object/array and then compare each line of the 2 objects (one for the original one for the text to compare with).
And with this way you can even directly get the lines where the differences are ! ;-)

#5 azure

azure
  • Members
  • 1203 posts

Posted 11 May 2012 - 08:55 PM

Why don't you just compare each line one by one ?


this was my first thought, but it's so hard to maintain! to actually write the scripts! because I have more than 10 templates!

#6 Gogo

Gogo
  • Guests

Posted 12 May 2012 - 07:20 AM

Template:= "

(

  THIS PARTNERSHIP AGREEMENT is made this __________ day of  ___________, 20__, by and between the following individuals: 

___________________________	Address: __________________________

City/State/ZIP:______________________ etc...

)"

Text:= "

(

  THIS PARTNERSHIP AGREEMENT is made this 12-th day of  May, 2012, by and between the following individuals: 

__ John Doe __	Address: __________________________

City/State/ZIP:______________________ etc...

)"

Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E"

Sort Needle, D| C U

MultiLineData:= RegExReplace(text,Needle,"`n")

MsgBox % MultiLineData


#7 azure

azure
  • Members
  • 1203 posts

Posted 16 May 2012 - 01:36 PM

I am very interested to this, can you explain me please what you do here:

Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E"
Sort Needle, D| C U
MultiLineData:= RegExReplace(text,Needle,"`n")
MsgBox % MultiLineData

basically, in a specific point of the template text, there can be a specific text (specific characters or null)

what I find confusing is that both the empty template and the filled template might contain characters that need to be escaped

how do I overcome this? or there is no need to?

#8 azure

azure
  • Members
  • 1203 posts

Posted 17 May 2012 - 11:43 AM

anyone????

#9 Gogo

Gogo
  • Guests

Posted 17 May 2012 - 07:03 PM

The line:
Needle:= "\Q" RegExReplace(Template,"_+","\E|\Q") "\E"
uses consecutive underlines ("_+") as delimiters to split the template in constant parts. These template parts are enclosed in escape sequences \Q ... \E. This way the template parts when are used in RegEx needle are accepted literally (there is no need to escape their characters). The template parts are concatenate again by a pipe "|" to form a needle like this:
"\Qtemplate_part1\E|\Qtemplate_part2\E|\Qtemplate_part3\E"
This needle is used to match and remove all constant parts of the template.

Sorting is needed to ensure that RegEx will try to match the biggest parts first and will remove as big as possible part of the filled form.

In brief - This approach has the next limitations:
1. The template can not contain "\E"
2. The filled fields can not contain an entire template part ( in our case " day of " or ", 20" )

#10 azure

azure
  • Members
  • 1203 posts

Posted 18 May 2012 - 11:14 AM

thanks for the analysis
the limitations are crucial, however this approach is a good initial, thanks