 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
BioCyborg Guest
|
Posted: Sat Jun 20, 2009 4:48 pm Post subject: help on truncating strings |
|
|
Hello,
I need help on truncating strings. My string goes like in a paragraph format like this:
| Code: |
[INDEX]
1.DATA1 {EVAL NUM} DATA2 {EVAL NUM} 2.DATA1 {EVAL NUM} DATA2 {EVAL NUM} 3.DATA1 {EVAL NUM} DATA2 {EVAL NUM} |
The objective is to get the first succession of a target number on the pair variable NUM, once the succession ends then truncate every following pair until the end of the paragraph, leaving the succession intact to save on a separate file.
example:
My target number for NUM is 7 to 7.9
| Code: | [INDEX]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5} |
after:
| Code: |
[INDEX]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} |
In pair 1, element 2 leaves target (NUM =8) but continues succession because both NUMs in the pair should leave the target in order to be considered true succession and to have truncation possible. On pair 5 we have 2 NUM not equal to 'target' (NUM=9 and 8.5) so we start truncate everything until the end of the paragraph.
And If just in case the target leaves in the second element on pair 4 (NUM=8.1) we leave the first element of the pair intact.
Another example:
| Code: | [INDEX]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5} |
after:
| Code: | [INDEX]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} |
If the unmatching pair is preceded by an element that match from the last pair, don't include the last element that matches.
Some issues:
-paragraphs vary in length ie. pairs can be up to 4 to 300 and the variables of course vary too
-There are over 50 000 paragraphs like this that are separated by 2 new lines.
example:
| Code: | PARAGRAPH 1
PARAGRAPH 2
PARAGRAPH 3 |
So truncate everything I guess until 2 newlines is found.
-there might be second successions (should be disregarded) but I think this won't be a problem since we delete the rest of the paragraph at the first one.
-the first succession might be found anywhere at the paragraph
-if no successions are found do not save
I have tried stringtrim and stringgetpos, problems are the NUM variable is not a constant (varies 7-7.9 for example). Even if constant I end up deleting the rest of the file instead of just one paragraph.
Any help and sample codes would be appreciated,
thanks in advance |
|
| Back to top |
|
 |
JDN
Joined: 24 Mar 2004 Posts: 299
|
Posted: Sat Jun 20, 2009 7:49 pm Post subject: |
|
|
I tried to read your post but I'm sorry to say that I just couldn't understand what you wanted.
Any chance you could re-write the post and make it a lot more simplified? |
|
| Back to top |
|
 |
[VxE]
Joined: 07 Oct 2006 Posts: 3254 Location: Simi Valley, CA
|
Posted: Sat Jun 20, 2009 7:56 pm Post subject: |
|
|
| Code: | testtext =
(
[INDEX1]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
[INDEX2]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
)
; fileread, testtext, datafile.txt
StringReplace, testtext, testtext, `r,, all ; remove CR's
StringReplace, testtext, testtext, `n`n`n,¿, all ; insert delimiter at paragraph breaks
Loop, Parse, testtext, ¿ ; parse by delimiter
{
InputBox, range, Please enter a range, Range = Low ~ High,,,,,,,, 7 ~ 7.9
StringSplit, range, range, ~, %A_Space%
InRange(0, range1, range2) ; set the range for this function
pos := InStr(A_LoopField, "1.") - 1 ; position of first data item
StringTrimLeft, lines, A_LoopField, %pos% ; the 'meat' of the paragraph
StringLeft, paragraph, A_LoopField, %pos% ; paragraph header ?
Loop, Parse, lines, } ; each 'pair' is separated by a "}" but also contains a "}"
{
If !A_LoopField
break ; if an empty item appears
i := Mod(A_Index,2) ; there are two numbers of interest
num%i% := SubStr(A_LoopField, 1+InStr(A_LoopField, " ", 0, 0))
If !i ; for the second item in the pair
If (Valid := InRange(num0) || InRange(num1))
{ ; if both numbers meet requirements, append to output
paragraph .= prev "}" A_LoopField "}"
pnum := num0 ; remember previous item
plen := StrLen(A_LoopField)+1
}
Else Break
Else prev := A_LoopField ; store the first item in the pair
}
If !InRange(pnum) ; remove previous item if it doesn't meet requirements
StringTrimRight, paragraph, paragraph, %plen%
If !Valid ; if there was a pair of items that caused truncation
outputtext .= paragraph "`n`n`n"
}
msgbox % outputtext
; filedelete, outputfile.txt
; fileappend, %outputtext%, outputfile.txt
InRange(n, l=-0x7FFFFFFF, h=0x7FFFFFFF)
{
static low, hi
If (l!=-0x7FFFFFFF)
low := l+0
If (h!=0x7FFFFFFF)
hi := h+0
return n >= low && n <= hi
} |  _________________ Ternary (a ? b : c) guide TSV Table Manipulation Library
Post code inside [code][/code] tags! |
|
| Back to top |
|
 |
BioCyborg Guest
|
Posted: Mon Jun 22, 2009 2:17 pm Post subject: |
|
|
| [VxE] wrote: | | Code: | testtext =
(
[INDEX1]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
[INDEX2]
1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
)
; fileread, testtext, datafile.txt
StringReplace, testtext, testtext, `r,, all ; remove CR's
StringReplace, testtext, testtext, `n`n`n,¿, all ; insert delimiter at paragraph breaks
Loop, Parse, testtext, ¿ ; parse by delimiter
{
InputBox, range, Please enter a range, Range = Low ~ High,,,,,,,, 7 ~ 7.9
StringSplit, range, range, ~, %A_Space%
InRange(0, range1, range2) ; set the range for this function
pos := InStr(A_LoopField, "1.") - 1 ; position of first data item
StringTrimLeft, lines, A_LoopField, %pos% ; the 'meat' of the paragraph
StringLeft, paragraph, A_LoopField, %pos% ; paragraph header ?
Loop, Parse, lines, } ; each 'pair' is separated by a "}" but also contains a "}"
{
If !A_LoopField
break ; if an empty item appears
i := Mod(A_Index,2) ; there are two numbers of interest
num%i% := SubStr(A_LoopField, 1+InStr(A_LoopField, " ", 0, 0))
If !i ; for the second item in the pair
If (Valid := InRange(num0) || InRange(num1))
{ ; if both numbers meet requirements, append to output
paragraph .= prev "}" A_LoopField "}"
pnum := num0 ; remember previous item
plen := StrLen(A_LoopField)+1
}
Else Break
Else prev := A_LoopField ; store the first item in the pair
}
If !InRange(pnum) ; remove previous item if it doesn't meet requirements
StringTrimRight, paragraph, paragraph, %plen%
If !Valid ; if there was a pair of items that caused truncation
outputtext .= paragraph "`n`n`n"
}
msgbox % outputtext
; filedelete, outputfile.txt
; fileappend, %outputtext%, outputfile.txt
InRange(n, l=-0x7FFFFFFF, h=0x7FFFFFFF)
{
static low, hi
If (l!=-0x7FFFFFFF)
low := l+0
If (h!=0x7FFFFFFF)
hi := h+0
return n >= low && n <= hi
} |  |
Thanks, the code does work fine. But there is a little bug when no successions are found. It still save the index to the output file.
Example:
| Code: | [H-TYPE213]
[Sector 3212B]
1.UUU {7.71\52 9} TGTT {7.11\51 9.1} 2.UUT {3.31\12 8} AAG {1.91\51 1.2} |
then after:
| Code: | [H-TYPE213]
[Sector 3212B] |
It is trivial but it is still annoying. |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|