Hello,
I need help on truncating strings. My string goes like in a paragraph format like this:
Code:
[INDEX]
1.DATA1 {EVAL NUM} DATA2 {EVAL NUM} 2.DATA1 {EVAL NUM} DATA2 {EVAL NUM} 3.DATA1 {EVAL NUM} DATA2 {EVAL NUM}
The objective is to get the first succession of a target number on the pair variable NUM, once the succession ends then truncate every following pair until the end of the paragraph, leaving the succession intact to save on a separate file.
example:
My target number for NUM is 7 to 7.9
Code:
[INDEX]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
after:
Code:
[INDEX]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3}
In pair 1, element 2 leaves target (NUM =8) but continues succession because both NUMs in the pair should leave the target in order to be considered true succession and to have truncation possible. On pair 5 we have 2 NUM not equal to 'target' (NUM=9 and 8.5) so we start truncate everything until the end of the paragraph.
And If just in case the target leaves in the second element on pair 4 (NUM=8.1) we leave the first element of the pair intact.
Another example:
Code:
[INDEX]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
after:
Code:
[INDEX]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3}
If the unmatching pair is preceded by an element that match from the last pair, don't include the last element that matches.
Some issues:
-paragraphs vary in length ie. pairs can be up to 4 to 300 and the variables of course vary too
-There are over 50 000 paragraphs like this that are separated by 2 new lines.
example:
Code:
PARAGRAPH 1
PARAGRAPH 2
PARAGRAPH 3
So truncate everything I guess until 2 newlines is found.
-there might be second successions (should be disregarded) but I think this won't be a problem since we delete the rest of the paragraph at the first one.
-the first succession might be found anywhere at the paragraph
-if no successions are found do not save
I have tried stringtrim and stringgetpos, problems are the NUM variable is not a constant (varies 7-7.9 for example). Even if constant I end up deleting the rest of the file instead of just one paragraph.
Any help and sample codes would be appreciated,
thanks in advance