AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

help on truncating strings

 
Reply to topic    AutoHotkey Community Forum Index -> Ask for Help
View previous topic :: View next topic  
Author Message
BioCyborg
Guest





PostPosted: Sat Jun 20, 2009 4:48 pm    Post subject: help on truncating strings Reply with quote

Hello,

I need help on truncating strings. My string goes like in a paragraph format like this:

Code:

[INDEX]

1.DATA1 {EVAL NUM} DATA2 {EVAL NUM} 2.DATA1 {EVAL NUM} DATA2 {EVAL NUM} 3.DATA1 {EVAL NUM} DATA2 {EVAL NUM}


The objective is to get the first succession of a target number on the pair variable NUM, once the succession ends then truncate every following pair until the end of the paragraph, leaving the succession intact to save on a separate file.

example:
My target number for NUM is 7 to 7.9

Code:
[INDEX]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}


after:
Code:

[INDEX]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3}


In pair 1, element 2 leaves target (NUM =8) but continues succession because both NUMs in the pair should leave the target in order to be considered true succession and to have truncation possible. On pair 5 we have 2 NUM not equal to 'target' (NUM=9 and 8.5) so we start truncate everything until the end of the paragraph.
And If just in case the target leaves in the second element on pair 4 (NUM=8.1) we leave the first element of the pair intact.

Another example:
Code:
[INDEX]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}


after:

Code:
[INDEX]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3}


If the unmatching pair is preceded by an element that match from the last pair, don't include the last element that matches.

Some issues:
-paragraphs vary in length ie. pairs can be up to 4 to 300 and the variables of course vary too
-There are over 50 000 paragraphs like this that are separated by 2 new lines.
example:
Code:
PARAGRAPH 1


PARAGRAPH 2


PARAGRAPH 3


So truncate everything I guess until 2 newlines is found.
-there might be second successions (should be disregarded) but I think this won't be a problem since we delete the rest of the paragraph at the first one.
-the first succession might be found anywhere at the paragraph
-if no successions are found do not save

I have tried stringtrim and stringgetpos, problems are the NUM variable is not a constant (varies 7-7.9 for example). Even if constant I end up deleting the rest of the file instead of just one paragraph.

Any help and sample codes would be appreciated,
thanks in advance
Back to top
JDN



Joined: 24 Mar 2004
Posts: 299

PostPosted: Sat Jun 20, 2009 7:49 pm    Post subject: Reply with quote

I tried to read your post but I'm sorry to say that I just couldn't understand what you wanted.

Any chance you could re-write the post and make it a lot more simplified?
Back to top
View user's profile Send private message
[VxE]



Joined: 07 Oct 2006
Posts: 3254
Location: Simi Valley, CA

PostPosted: Sat Jun 20, 2009 7:56 pm    Post subject: Reply with quote

Code:
testtext =
(
[INDEX1]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}


[INDEX2]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
)
; fileread, testtext, datafile.txt
StringReplace, testtext, testtext, `r,, all ; remove CR's
StringReplace, testtext, testtext, `n`n`n,¿, all ; insert delimiter at paragraph breaks
Loop, Parse, testtext, ¿ ; parse by delimiter
{
   InputBox, range, Please enter a range, Range = Low ~ High,,,,,,,, 7 ~ 7.9
   StringSplit, range, range, ~, %A_Space%
   InRange(0, range1, range2) ; set the range for this function
   pos := InStr(A_LoopField, "1.") - 1 ; position of first data item
   StringTrimLeft, lines, A_LoopField, %pos% ; the 'meat' of the paragraph
   StringLeft, paragraph, A_LoopField, %pos% ; paragraph header ?
   Loop, Parse, lines, } ; each 'pair' is separated by a "}" but also contains a "}"
   {
      If !A_LoopField
         break ; if an empty item appears
      i := Mod(A_Index,2) ; there are two numbers of interest
      num%i% := SubStr(A_LoopField, 1+InStr(A_LoopField, " ", 0, 0))
      If !i ; for the second item in the pair
         If (Valid := InRange(num0) || InRange(num1))
         { ; if both numbers meet requirements, append to output
            paragraph .= prev "}" A_LoopField "}"
            pnum := num0 ; remember previous item
            plen := StrLen(A_LoopField)+1
         }
         Else   Break
      Else   prev := A_LoopField ; store the first item in the pair
   }
   If !InRange(pnum) ; remove previous item if it doesn't meet requirements
      StringTrimRight, paragraph, paragraph, %plen%
   If !Valid ; if there was a pair of items that caused truncation
      outputtext .= paragraph "`n`n`n"
}
msgbox % outputtext
; filedelete, outputfile.txt
; fileappend, %outputtext%, outputfile.txt


InRange(n, l=-0x7FFFFFFF, h=0x7FFFFFFF)
{
   static low, hi
   If (l!=-0x7FFFFFFF)
      low := l+0
   If (h!=0x7FFFFFFF)
      hi := h+0
   return n >= low && n <= hi
}
Question
_________________
Ternary (a ? b : c) guide     TSV Table Manipulation Library
Post code inside [code][/code] tags!
Back to top
View user's profile Send private message
BioCyborg
Guest





PostPosted: Mon Jun 22, 2009 2:17 pm    Post subject: Reply with quote

[VxE] wrote:
Code:
testtext =
(
[INDEX1]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}


[INDEX2]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
)
; fileread, testtext, datafile.txt
StringReplace, testtext, testtext, `r,, all ; remove CR's
StringReplace, testtext, testtext, `n`n`n,¿, all ; insert delimiter at paragraph breaks
Loop, Parse, testtext, ¿ ; parse by delimiter
{
   InputBox, range, Please enter a range, Range = Low ~ High,,,,,,,, 7 ~ 7.9
   StringSplit, range, range, ~, %A_Space%
   InRange(0, range1, range2) ; set the range for this function
   pos := InStr(A_LoopField, "1.") - 1 ; position of first data item
   StringTrimLeft, lines, A_LoopField, %pos% ; the 'meat' of the paragraph
   StringLeft, paragraph, A_LoopField, %pos% ; paragraph header ?
   Loop, Parse, lines, } ; each 'pair' is separated by a "}" but also contains a "}"
   {
      If !A_LoopField
         break ; if an empty item appears
      i := Mod(A_Index,2) ; there are two numbers of interest
      num%i% := SubStr(A_LoopField, 1+InStr(A_LoopField, " ", 0, 0))
      If !i ; for the second item in the pair
         If (Valid := InRange(num0) || InRange(num1))
         { ; if both numbers meet requirements, append to output
            paragraph .= prev "}" A_LoopField "}"
            pnum := num0 ; remember previous item
            plen := StrLen(A_LoopField)+1
         }
         Else   Break
      Else   prev := A_LoopField ; store the first item in the pair
   }
   If !InRange(pnum) ; remove previous item if it doesn't meet requirements
      StringTrimRight, paragraph, paragraph, %plen%
   If !Valid ; if there was a pair of items that caused truncation
      outputtext .= paragraph "`n`n`n"
}
msgbox % outputtext
; filedelete, outputfile.txt
; fileappend, %outputtext%, outputfile.txt


InRange(n, l=-0x7FFFFFFF, h=0x7FFFFFFF)
{
   static low, hi
   If (l!=-0x7FFFFFFF)
      low := l+0
   If (h!=0x7FFFFFFF)
      hi := h+0
   return n >= low && n <= hi
}
Question


Thanks, the code does work fine. But there is a little bug when no successions are found. It still save the index to the output file.

Example:
Code:
[H-TYPE213]
[Sector 3212B]

1.UUU {7.71\52 9} TGTT {7.11\51 9.1} 2.UUT {3.31\12 8} AAG {1.91\51 1.2}


then after:
Code:
[H-TYPE213]
[Sector 3212B]


It is trivial but it is still annoying.
Back to top
Display posts from previous:   
Reply to topic    AutoHotkey Community Forum Index -> Ask for Help All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group