Jump to content


Photo

[AHK_L]Rosetta Code - Merge Sort


  • Please log in to reply
11 replies to this topic

#1 rbrtryn

rbrtryn
  • Members
  • 790 posts

Posted 15 October 2011 - 06:34 PM

Most of the Rosetta Code examples for AHK are extremely dated, so I decided update one of the easier ones.

This code implements the Merge Sort.

I'll post this at Rosetta Code, after it's vetted here.

#NoEnv

Test := []
Loop 100 {
    Random n, 1, 1000
    Test.Insert( n )
}
Result := MergeSort( Test )
Loop % Result.MaxIndex() {
    MsgBox, 1, , % Result[A_Index]
    IfMsgBox Cancel
        Break
}
Return


/*
    Function MergeSort
        Sorts an array by first recursively splitting it down to its
        individual elements and then merging those elements in their
        correct order.
       
    Parameters
        Array   The array to be sorted
       
    Returns
        The sorted array
*/
MergeSort( Array )
    {
        ; Return single element arrays
        If ( ! Array.HasKey( 2 ) )
            Return Array
		
        ; Split array into Left and Right halfs
		Left := []
		Right := []
        Middle := Array.MaxIndex() // 2
		While ( Array.MaxIndex() )
			If ( A_Index <= Middle )
				Left.Insert( Array.Remove( 1 ) )
			Else
				Right.Insert( Array.Remove( 1 ) )
       
        ; Recurse into Left and Right
        Left  := MergeSort( Left )
        Right := MergeSort( Right )
       
        Return ___Merge( Left, Right )
    }


/* 
    Function ___Merge
        Merges two sorted arrays so that the result is sorted
       
    Parameters
        Left    The first array
        Right   The second array
       
    Returns
        The sorted merging of Left and Right
*/
___Merge(Left, Right)
    {
        Result := []
       
        ; If all the Right values are greater than all the
        ; Left values, just append Right at the end of Left.
        If ( Left[Left.MaxIndex()] <= Right[1] ) {
            While ( Right.MaxIndex() )
                Left.Insert( Right.Remove( 1 ) )
            Return Left
        }
       
        ; Loop until one of the arrays is empty
        While( Left.MaxIndex() and Right.MaxIndex() )
            If ( Left[1] <= Right[1] )
                Result.Insert( Left.Remove( 1 ) )
            Else
                Result.Insert( Right.Remove( 1 ) )

        While ( Left.MaxIndex() )
            Result.Insert( Left.Remove( 1 ) )

        While ( Right.MaxIndex() )
            Result.Insert( Right.Remove( 1 ) )
           
        Return Result
    }

Hopefully this will inspire others to update Rosetta Code using AHK_L syntax.
_________________________
Composed/Posted with WYSIWYG BBCode Editor

#2 Uberi

Uberi
  • Fellows
  • 1046 posts

Posted 17 October 2011 - 08:11 PM

Here's a single function version that's a bit faster:

SortObject(InputObject)
{
 MaxIndex := ObjMaxIndex(InputObject), (MaxIndex = "") ? (MaxIndex := 0) : ""
 If (MaxIndex < 2)
  Return, InputObject
 Middle := MaxIndex >> 1, SortLeft := Object(), SortRight := Object()
 Loop, %Middle%
  ObjInsert(SortLeft,InputObject[A_Index]), ObjInsert(SortRight,InputObject[Middle + A_Index])
 If (MaxIndex & 1)
  ObjInsert(SortRight,InputObject[MaxIndex])
 SortLeft := SortObject(SortLeft), SortRight := SortObject(SortRight), MaxRight := MaxIndex - Middle, LeftIndex := 1, RightIndex := 1, Result := Object()
 Loop, %MaxIndex%
 {
  If (LeftIndex > Middle)
   ObjInsert(Result,SortRight[RightIndex]), RightIndex ++
  Else If ((RightIndex > MaxRight) || (SortLeft[LeftIndex] < SortRight[RightIndex]))
   ObjInsert(Result,SortLeft[LeftIndex]), LeftIndex ++
  Else
   ObjInsert(Result,SortRight[RightIndex]), RightIndex ++
 }
 Return, Result
}


#3 rbrtryn

rbrtryn
  • Members
  • 790 posts

Posted 17 October 2011 - 09:36 PM

How much faster is it? I don't have any reliable means to measure this at my disposal.

The extra speed may come at a cost. If I am reading your algorithim correctly, your function requires 3n locations to implement. This is conciderably worse than the 2n locations normally required to implement this sort.

Merge sort also has some demerits. One is its use of 2n locations; the additional n locations were needed because one couldn't reasonably merge two sorted sets in place

MergeSort uses AHK's feature of releasing array space at run time to only use n locations.

#4 Uberi

Uberi
  • Fellows
  • 1046 posts

Posted 17 October 2011 - 09:53 PM

Benchmark results:

Benchmark:       Merge Sort implementations
Iterations:      10000
Control Run:     4.566790 ms.  (0.000457 ms. per run)
Original:        41.607291 ms. (0.004161 ms. per run)
Uberi's version: 27.878923 ms. (0.002788 ms. per run)

Summary: the one I posted is about 1.4924677188 times faster.

Edit: Forgot to mention I used an array with 100 items as in the included example.

#5 rbrtryn

rbrtryn
  • Members
  • 790 posts

Posted 18 October 2011 - 03:07 PM

Summary: the one I posted is about 1.4924677188 times faster.

Well that is significant. Could you try this version to see if it is any faster?
I made these improvements:

[*:24f8fcak]Borrowed your idea of only looping to the middle when splitting.
[*:24f8fcak]Changed most of the While statements to Loop.
[*:24f8fcak]Combined the two functions into oneCan you see any other improvements I can make, while still maintaining the {1n} space requirement?
#NoEnv

Test := []
Loop 100 {
    Random n, 0, 999
    Test.Insert(n)
}
Result := MergeSort(Test)
Loop % Result.MaxIndex() {
    MsgBox, 1, , % Result[A_Index]
    IfMsgBox Cancel
        Break
}
Return


/*
    Function MergeSort
        Sorts an array by first recursively splitting it down to its
        individual elements and then merging those elements in their
        correct order.
       
    Parameters
        Array   The array to be sorted
       
    Returns
        The sorted array
*/
MergeSort(Array)
    {
        ; Return single element arrays
        If (! Array.HasKey(2))
            Return Array

        ; Split array into Left and Right halfs
        Left := [], Right := [], Middle := Array.MaxIndex() // 2
        Loop % Middle
            Right.Insert(Array.Remove(Middle-- + 1)), Left.Insert(Array.Remove(1))
        If (Array.MaxIndex())
            Right.Insert(Array.Remove(1))
       
        Left := MergeSort(Left), Right := MergeSort(Right)

        ; If all the Right values are greater than all the
        ; Left values, just append Right at the end of Left.
        If (Left[Left.MaxIndex()] <= Right[1]) {
            Loop % Right.MaxIndex()
                Left.Insert(Right.Remove(1))
            Return Left
        }
        ; Loop until one of the arrays is empty
        While(Left.MaxIndex() and Right.MaxIndex())
            Left[1] <= Right[1] ? Array.Insert(Left.Remove(1))
                                : Array.Insert(Right.Remove(1))

        Loop % Left.MaxIndex()
            Array.Insert(Left.Remove(1))

        Loop % Right.MaxIndex()
            Array.Insert(Right.Remove(1))
           
        Return Array
    }


#6 Uberi

Uberi
  • Fellows
  • 1046 posts

Posted 19 October 2011 - 02:08 AM

Benchmarked again:

Benchmark:       Merge sort 100 item array
Iterations:      1000
Control Run:     4.100883 ms.    (0.004101 ms. per run)
Original:        3456.203593 ms. (3.456204 ms. per run)
Uberi's version: 2741.191417 ms. (2.741191 ms. per run)

Seems like the last test was faulty sure to MergeSort() modifying the source object. I've added ObjClone(Test) to ensure that does not affect results. It is actually only 1.2608402698 times faster. (Edit: I got it up as high as 1.3547413737 times faster, but it was so terse as to be unreadable. I don't think that would be very suitable for RosettaCode, so I won't post it here)

I realize that memory usage is very important, but all those array manipulations have a significant cost proportional to O(log(n)) since arrays are binary trees.

My version would gain a significant speed advantage given "real" arrays, contiguous memory that can be accessed by an offset. However, your code is much more space efficient. It's a tradeoff.

#7 rbrtryn

rbrtryn
  • Members
  • 790 posts

Posted 20 October 2011 - 07:20 PM

Sorry for not getting back to you sooner. I want to thank for your time and help with this :D

What benchmarking software do you use?
Is it freeware that can be downloaded?

I realize that memory usage is very important, but all those array manipulations have a significant cost proportional to O(log(n)) since arrays are binary trees.

:oops: I should have realised that was how they were implemented. In that case adding/deleting nodes is going to be a real time suck. This version avoids that as much as possible and it reuses the input array as the output. No sense in building a whole new array just to contain the result. It uses 2n locations, which is typical for this type of sort.
/*
    Function MergeSort
        Sorts an array by first recursively splitting it down to its
        individual elements and then merging those elements in their
        correct order.
       
    Parameters
        Array   The array to be sorted
       
    Returns
        The sorted array
        
    Reference:
        http://www.sorting-algorithms.com/merge-sort
*/
MergeSort(Array)
{
    ; Base case
    If ( !Array.MaxIndex() or Array.MaxIndex() = 1)
        Return Array

    ; Split array into Left and Right halfs
    Left := [], Right := [], Middle := Array.MaxIndex() >> 1
    Loop % Middle
        Left[A_Index] := Array[A_Index], Right[A_Index] := Array[Middle + A_Index]
    If (Array.MaxIndex() & 1)
        Right.Insert(Array[Array.MaxIndex()])
    
    Left := MergeSort(Left), Right := MergeSort(Right)
    
    Index := LIndex := RIndex := 1
    While (LIndex <= Left.MaxIndex() and RIndex <= Right.MaxIndex())
        Array[Index++] := Left[LIndex] <= Right[RIndex] ? Left[LIndex++] : Right[RIndex++]
    
    While (LIndex <= Left.MaxIndex())
        Array[Index++] := Left[LIndex++]
    
    While (RIndex <= Right.MaxIndex())
        Array[Index++] := Right[RIndex++]
    Return Array
}


#8 Uberi

Uberi
  • Fellows
  • 1046 posts

Posted 21 October 2011 - 01:58 AM

What benchmarking software do you use?

It's a custom script that I haven't released since it's quite similar to Leo Xiong/Nameless-exe's Script Benchmarker. I can post it here if you like, but the script I linked to works quite nicely as well.

If you're going for maximum time and memory efficiency, I think this may be it:

SortArray(InputArray)
{
 MaxIndex := ObjMaxIndex(InputArray), (MaxIndex = "") ? (MaxIndex := 0) : ""
 If (MaxIndex < 2)
  Return, InputArray
 Middle := MaxIndex >> 1
 SortLeft := ObjClone(InputArray), SortRight := ObjClone(InputArray)
 ObjRemove(SortLeft,Middle + 1,MaxIndex), ObjRemove(SortRight,1,Middle)
 SortArray(SortLeft), SortArray(SortRight), MaxRight := MaxIndex - Middle, LeftIndex := 1, RightIndex := 1
 Loop, %MaxIndex%
  InputArray[A_Index] := (LeftIndex <= Middle && (RightIndex > MaxRight || SortLeft[LeftIndex] < SortRight[RightIndex])) ? SortLeft[LeftIndex ++] : SortRight[RightIndex ++]
}

It's 1.883936 times faster than the version you have posted immediately above:

Uberi's version: 867.160864
Original:        1528.096053

Also, it is an in-place sort now:

Array := ["A","B","C"]
SortArray(Array)
;array is now sorted

However, since it is somewhat less readable than the previous examples, I'd understand if you want to use your version, since RosettaCode entries are more for educational purposes anyways.

#9 Guests

  • Guests

Posted 03 November 2011 - 07:36 AM

Uberi, excellent. I tried your function with huge amount of data and it runs fine. I wonder why your function does not cause stack overflow in spite of the recursive function calls although AHK has a limitation with it.

#10 Uberi

Uberi
  • Fellows
  • 1046 posts

Posted 14 January 2012 - 01:30 AM

I've come across this code again recently, and having learned more about sorting in the meantime, I've made it faster once again:

SortArray(InputArray)
{
 MaxIndex := ObjMaxIndex(InputArray), (MaxIndex = "") ? (MaxIndex := 0) : ""
 If MaxIndex <= 8
 {
  If MaxIndex <= 1
   Return, InputArray
  Loop, % MaxIndex - 1
  {
   Index := A_Index
   While, Index > 0 && InputArray[Index] > InputArray[Index + 1]
    Value := InputArray[Index + 1], InputArray[Index + 1] := InputArray[Index], InputArray[Index] := Value, Index --
  }
 }
 Else
 {
  Middle := MaxIndex >> 1
  SortLeft := ObjClone(InputArray), ObjRemove(SortLeft,Middle + 1,MaxIndex)
  SortRight := ObjClone(InputArray), ObjRemove(SortRight,1,Middle)
  SortArray(SortLeft), SortArray(SortRight), MaxRight := MaxIndex - Middle, LeftIndex := 1, RightIndex := 1
  Loop, %MaxIndex%
   InputArray[A_Index] := (LeftIndex <= Middle && (RightIndex > MaxRight || SortLeft[LeftIndex] < SortRight[RightIndex])) ? SortLeft[LeftIndex ++] : SortRight[RightIndex ++]
 }
}

Now the function switches to the insertion sort algorithm if the number of elements is less than or equal to 8, which is the threshold after which insertion sort is faster than merge sort.

While insertion sort is known to be quadratic in time complexity, for small arrays it is actually more efficient than many other algorithms. Switching to the right algorithm for the given number of elements combines the advantages of both.

#11 Uberi

Uberi
  • Fellows
  • 1046 posts

Posted 14 January 2012 - 01:36 AM

Oh and here are the benchmarks:

Benchmark:        Sorting functions
Iterations:       1000
Control Run:      0.863407 ms.    (0.000863 ms. per run)
Original version: 3004.968792 ms. (3.004969 ms. per run)
Uberi's version:  1383.388197 ms. (1.383388 ms. per run)

It is now 2.172180 times faster than the original version.

#12 Deo

Deo
  • Members
  • 199 posts

Posted 16 January 2012 - 09:10 AM

cool, thanks for your great work