Jump to content


Photo

Logical sort text-number sequences


  • Please log in to reply
5 replies to this topic

#1 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 17 September 2006 - 01:44 AM

ListView controls can sort columns now "Logically" [AHK v1.0.44.12+]: sequences of digits in the text are treated as true numbers rather than mere characters. It can be used to sort an arbitrary sequence if it is copied in a ListView. Unfortunately, it requires Windows XP or later. For other Windows versions here is a primitive sorting algorithm in pure AHK, which handles the most important case: text followed by a number. The result will be like file1, file2, file11, file100...

The shortest sorting algorithm, I know of, is used. It parses the list and the leftmost entry is appended to the output. This entry is removed from the input and the search-append procedure is repeated, with the second, third... entries from left. The key is the comparison of the entries. It is done in the After(x,y) function, which returns true (1) if x is after y, and 0 otherwise. You can edit this function for other sorting criteria.

In this simple example the PreText(x) function finds the string preceding the first digit. If we compare two strings with the same PreText, we can remove it and compare the rest as numbers. If the PreText values are different, string comparisons are used.
#NoEnv
SetBatchLines -1

a = y0,x2,x10,x100,x1,x3,  ; terminate with delimiter!
MsgBox % Sort(a,",")

Sort(x, delim="") {
   IfEqual delim,, SetEnv delim, `n
   x = %delim%%x%
   Loop {
      Loop Parse, x, %delim%
         If (A_LoopField > "" and (A_Index = 2 or After(y,A_LoopField)))
            y = %A_LoopField%
      z = %z%%y%%delim%
      StringReplace x, x, %delim%%y%%delim%, %delim%
      IfEqual x,%delim%, Return z
   }
}

After(x,y) {
   p := PreText(x)
   If (p <> PreText(y))
      Return x>y
   StringReplace x, x, %p%
   StringReplace y, y, %p%
   Return x>y
}

PreText(x) {
   Loop Parse, x, 0123456789
      Return A_LoopField
}
The sorting algorithm is of quadratic time of the length of the list. With somewhat longer script we can sort faster. With my 2GHz laptop sorting 1000 entries took under 4 seconds with the above code, which should be enough for casual uses. If there is a need, I will post a longer but faster version, but with interpreted code one should not process millions of entries.

If further letters follow the last digit, the After function reverts to string comparison, which results in file10a, file2, file3... It is not clear, what is the right order in this case. One could use the numbers only, requiring somewhat longer code. It is not hard to do some mods.

#2 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 17 September 2006 - 02:49 AM

Here is another simple quadratic time sort algorithm to experiment with, using StringSplit to convert the list to an array. It is a little longer than the parse loop variant above, but slightly faster for LONG entries (10 chars or more). It does not need the trailing delimiter.
Sort(x, delim="") {

   IfEqual delim,, SetEnv delim, `n

   StringSplit x, x, %delim%

   Loop % x0-1 {

      j = %A_Index%

      k = %j%

      Loop % x0-A_Index {

         j++

         If (After(x%k%,x%j%))

            k = %j%

      }

      z := z x%k% delim

      x%k% := x%A_Index%

   }

   Return z x%x0%

}


#3 Chris

Chris
  • Administrators
  • 10727 posts

Posted 17 September 2006 - 10:58 AM

I'd a feeling you were going to take the bait and create a function for quasi-logical sorting. :)

This will be a great resource. Thanks for writing it.

#4 Laszlo

Laszlo
  • Fellows
  • 4713 posts

Posted 17 September 2006 - 04:19 PM

Here is a high speed variant. It can sort a list of 100,000 short entries in about a second by a 2GHz laptop. On a directory listing it is practically instantaneous. First the list is sorted by the built in Sort command, which puts all the entries with the same PreText (text preceding the digits) next to each other. Then the script scans this list, comparing the PreText values of neighbor entries. If they are the same, the entries get collected in the list y. When a different PreText value is found, the list y of the previous entries with the same PreText get sorted, also with AHK's sort command, and it is appended to the output z. This is a numeric sort, it ignores the identical PreText's by specifying the appropriate starting column.
#NoEnv

SetBatchLines -1



a = y0,x2,x10,x100,x1,x3, ; terminate with delimiter!

MsgBox % Sort(a,",")



Sort(x, delim="") {       ; LOGIC SORT, x IS terminated with delimiter!

   IfEqual delim,, SetEnv delim, `n

   Sort x, D%delim%

   Loop Parse, x, %delim%

   {

      If (p = PreText(A_LoopField))

         y = %y%%delim%%A_LoopField%

      Else {

         Sort y, % "N D" delim " P" StrLen(p)+1

         z = %z%%y%%delim%

         p := PreText(A_LoopField)

         y = %A_LoopField%

      }

   }

   StringTrimLeft z, z, 1

   Return z

}



PreText(x) {

   Loop Parse, x, 0123456789

      Return A_LoopField

}


#5 Chris

Chris
  • Administrators
  • 10727 posts

Posted 19 September 2006 - 12:34 AM

:shock: Wow!

#6 PhiLho

PhiLho
  • Fellows
  • 6850 posts

Posted 19 September 2006 - 09:14 AM

I haven't tried to fully understand it yet, but it seems very smart.
I like the PreText function, another clever use of Loop where you loop only once, using its side effect... (the other use is to transform short file name to long one (or reverse)).