AutoHotkey Community

It is currently May 26th, 2012, 6:53 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: June 20th, 2009, 5:48 pm 
Hello,

I need help on truncating strings. My string goes like in a paragraph format like this:

Code:
[INDEX]

1.DATA1 {EVAL NUM} DATA2 {EVAL NUM} 2.DATA1 {EVAL NUM} DATA2 {EVAL NUM} 3.DATA1 {EVAL NUM} DATA2 {EVAL NUM}


The objective is to get the first succession of a target number on the pair variable NUM, once the succession ends then truncate every following pair until the end of the paragraph, leaving the succession intact to save on a separate file.

example:
My target number for NUM is 7 to 7.9

Code:
[INDEX]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}


after:
Code:
[INDEX]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3}


In pair 1, element 2 leaves target (NUM =8) but continues succession because both NUMs in the pair should leave the target in order to be considered true succession and to have truncation possible. On pair 5 we have 2 NUM not equal to 'target' (NUM=9 and 8.5) so we start truncate everything until the end of the paragraph.
And If just in case the target leaves in the second element on pair 4 (NUM=8.1) we leave the first element of the pair intact.

Another example:
Code:
[INDEX]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}


after:

Code:
[INDEX]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3}


If the unmatching pair is preceded by an element that match from the last pair, don't include the last element that matches.

Some issues:
-paragraphs vary in length ie. pairs can be up to 4 to 300 and the variables of course vary too
-There are over 50 000 paragraphs like this that are separated by 2 new lines.
example:
Code:
PARAGRAPH 1


PARAGRAPH 2


PARAGRAPH 3


So truncate everything I guess until 2 newlines is found.
-there might be second successions (should be disregarded) but I think this won't be a problem since we delete the rest of the paragraph at the first one.
-the first succession might be found anywhere at the paragraph
-if no successions are found do not save

I have tried stringtrim and stringgetpos, problems are the NUM variable is not a constant (varies 7-7.9 for example). Even if constant I end up deleting the rest of the file instead of just one paragraph.

Any help and sample codes would be appreciated,
thanks in advance


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: June 20th, 2009, 8:49 pm 
Offline

Joined: March 24th, 2004, 2:34 pm
Posts: 299
I tried to read your post but I'm sorry to say that I just couldn't understand what you wanted.

Any chance you could re-write the post and make it a lot more simplified?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 20th, 2009, 8:56 pm 
Offline
User avatar

Joined: October 7th, 2006, 8:45 am
Posts: 3329
Location: Simi Valley, CA
Code:
testtext =
(
[INDEX1]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}


[INDEX2]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
)
; fileread, testtext, datafile.txt
StringReplace, testtext, testtext, `r,, all ; remove CR's
StringReplace, testtext, testtext, `n`n`n,¿, all ; insert delimiter at paragraph breaks
Loop, Parse, testtext, ¿ ; parse by delimiter
{
   InputBox, range, Please enter a range, Range = Low ~ High,,,,,,,, 7 ~ 7.9
   StringSplit, range, range, ~, %A_Space%
   InRange(0, range1, range2) ; set the range for this function
   pos := InStr(A_LoopField, "1.") - 1 ; position of first data item
   StringTrimLeft, lines, A_LoopField, %pos% ; the 'meat' of the paragraph
   StringLeft, paragraph, A_LoopField, %pos% ; paragraph header ?
   Loop, Parse, lines, } ; each 'pair' is separated by a "}" but also contains a "}"
   {
      If !A_LoopField
         break ; if an empty item appears
      i := Mod(A_Index,2) ; there are two numbers of interest
      num%i% := SubStr(A_LoopField, 1+InStr(A_LoopField, " ", 0, 0))
      If !i ; for the second item in the pair
         If (Valid := InRange(num0) || InRange(num1))
         { ; if both numbers meet requirements, append to output
            paragraph .= prev "}" A_LoopField "}"
            pnum := num0 ; remember previous item
            plen := StrLen(A_LoopField)+1
         }
         Else   Break
      Else   prev := A_LoopField ; store the first item in the pair
   }
   If !InRange(pnum) ; remove previous item if it doesn't meet requirements
      StringTrimRight, paragraph, paragraph, %plen%
   If !Valid ; if there was a pair of items that caused truncation
      outputtext .= paragraph "`n`n`n"
}
msgbox % outputtext
; filedelete, outputfile.txt
; fileappend, %outputtext%, outputfile.txt


InRange(n, l=-0x7FFFFFFF, h=0x7FFFFFFF)
{
   static low, hi
   If (l!=-0x7FFFFFFF)
      low := l+0
   If (h!=0x7FFFFFFF)
      hi := h+0
   return n >= low && n <= hi
}
:?:

_________________
Ternary (a ? b : c) guide     TSV Table Manipulation Library
Post code inside [code][/code] tags!


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 22nd, 2009, 3:17 pm 
[VxE] wrote:
Code:
testtext =
(
[INDEX1]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 7.3} TGG {3.31\21 8.1} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}


[INDEX2]

1.CAT {9.21\52 7} GT {1.21\51 8} 2.GT {1.11\12 7.4} TAG {1.21\51 7.2} 3.UAAT {1.11\12 7.1} CTTG {0.11\21 7.3}
4.TAT {2.12\12 8.1} TGG {3.31\21 7.3} 5.UUT {3.33\12 9} TGG {3.31\21 8.5}
)
; fileread, testtext, datafile.txt
StringReplace, testtext, testtext, `r,, all ; remove CR's
StringReplace, testtext, testtext, `n`n`n,¿, all ; insert delimiter at paragraph breaks
Loop, Parse, testtext, ¿ ; parse by delimiter
{
   InputBox, range, Please enter a range, Range = Low ~ High,,,,,,,, 7 ~ 7.9
   StringSplit, range, range, ~, %A_Space%
   InRange(0, range1, range2) ; set the range for this function
   pos := InStr(A_LoopField, "1.") - 1 ; position of first data item
   StringTrimLeft, lines, A_LoopField, %pos% ; the 'meat' of the paragraph
   StringLeft, paragraph, A_LoopField, %pos% ; paragraph header ?
   Loop, Parse, lines, } ; each 'pair' is separated by a "}" but also contains a "}"
   {
      If !A_LoopField
         break ; if an empty item appears
      i := Mod(A_Index,2) ; there are two numbers of interest
      num%i% := SubStr(A_LoopField, 1+InStr(A_LoopField, " ", 0, 0))
      If !i ; for the second item in the pair
         If (Valid := InRange(num0) || InRange(num1))
         { ; if both numbers meet requirements, append to output
            paragraph .= prev "}" A_LoopField "}"
            pnum := num0 ; remember previous item
            plen := StrLen(A_LoopField)+1
         }
         Else   Break
      Else   prev := A_LoopField ; store the first item in the pair
   }
   If !InRange(pnum) ; remove previous item if it doesn't meet requirements
      StringTrimRight, paragraph, paragraph, %plen%
   If !Valid ; if there was a pair of items that caused truncation
      outputtext .= paragraph "`n`n`n"
}
msgbox % outputtext
; filedelete, outputfile.txt
; fileappend, %outputtext%, outputfile.txt


InRange(n, l=-0x7FFFFFFF, h=0x7FFFFFFF)
{
   static low, hi
   If (l!=-0x7FFFFFFF)
      low := l+0
   If (h!=0x7FFFFFFF)
      hi := h+0
   return n >= low && n <= hi
}
:?:


Thanks, the code does work fine. But there is a little bug when no successions are found. It still save the index to the output file.

Example:
Code:
[H-TYPE213]
[Sector 3212B]

1.UUU {7.71\52 9} TGTT {7.11\51 9.1} 2.UUT {3.31\12 8} AAG {1.91\51 1.2}


then after:
Code:
[H-TYPE213]
[Sector 3212B]


It is trivial but it is still annoying.


Report this post
Top
  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: dra, rbrtryn, Yahoo [Bot] and 58 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group