Sort CSV Date Column

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
newpie
Posts: 37
Joined: 01 Jun 2014, 07:50

Sort CSV Date Column

03 Jun 2016, 15:52

Hello, is there a way to use the sort command to sort this format of date like below, doesn't matter if asc or dsc, just as long the dates are grouped.

Code: Select all

var = 
(
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/27,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/27,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/23,bar
)
Thanks for any help.

I imagine it can be done with something similar to the code below by linearspoon (https://autohotkey.com/boards/viewtopic.php?t=1679), but I am unsure how to mod.

Code: Select all

var = 
(
0000Size	0000MD5
3243008 1
3243008 2
3243008 3
3243008 4
3243008 5
3243008 6
)
 
Sort, var, F MySort
Msgbox % var
 
MySort(item1, item2, offset)
{
  RegexMatch(item1, "S)^\d+", num1)
  RegexMatch(item2, "S)^\d+", num2)
  if (num1 = num2)
    return -offset
  return num1 - num2
}
newpie
Posts: 37
Joined: 01 Jun 2014, 07:50

Re: Sort CSV Date Column

03 Jun 2016, 16:04

I managed to get it working by changing the placement of the date to 1st column, but I am still curious on how to sort another column if not in first. Thanks

Code: Select all

var = 
(
2016/05/30,foo,bar,foo,bar,foo,bar
2016/05/30,foo,bar,foo,bar,foo,bar
2016/05/27,foo,bar,foo,bar,foo,bar
2016/05/30,foo,bar,foo,bar,foo,bar
2016/05/30,foo,bar,foo,bar,foo,bar
2016/05/27,foo,bar,foo,bar,foo,bar
2016/05/30,foo,bar,foo,bar,foo,bar
2016/05/23,foo,bar,foo,bar,foo,bar
)

Sort, var
Msgbox % var
User avatar
atnbueno
Posts: 89
Joined: 12 Oct 2013, 04:45
Contact:

Re: Sort CSV Date Column

04 Jun 2016, 06:45

Hello newpie.

Indeed, you need to define a callback function that tells Sort how to compare your data.

Something like this:

Code: Select all

var = 
(
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/27,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/27,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/23,bar
)
 
Sort, var, F newpieSort
Msgbox % var
 
newpieSort(item1, item2)
{
	date1 := StrSplit(item1, ",")[6]
	date2 := StrSplit(item2, ",")[6]
	return date1 > date2 ? 1 : date1 < date2 ? -1 : 0  
}
Regards,
Antonio B.
just me
Posts: 9466
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Sort CSV Date Column

05 Jun 2016, 00:38

User-defined sort functions are usually rather slow. In many cases it's faster to reorder the input variable and use the built-in sort algorithm for something like a 'column sort':

Code: Select all

#NoEnv
SetBatchLines, -1

Var =
(
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/27,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/27,bar
foo,bar,foo,bar,foo,2016/05/30,bar
foo,bar,foo,bar,foo,2016/05/23,bar
)

I := 1000

S := A_TickCount
Loop, %I% {
   V := Var
   Sorted := ColumnSort(V, 6)
}
T := A_TickCount - S
MsgBox, 0, ColumnSort, %T% ms for %I% iterations:`n`n%Sorted%

S := A_TickCount
Loop, %I% {
   Sorted := Var
   Sort, Sorted, F newpieSort
}
T := A_TickCount - S
MsgBox, 0, newpieSort, %T% ms for %I% iterations:`n`n%Sorted%


ColumnSort(Var, Column, Separator := ",", SortOptions := "") {
   Static BS := Chr(8)
   VarSetCapacity(SortIn, StrLen(Var) * 2)
   Loop, Parse, Var, `n
      SortIn .= StrSplit(A_LoopField, Separator)[Column] . BS . A_LoopField . "`n"
   Sort, SortIn, %SortOptions%
   VarSetCapacity(SortOut, StrLen(SortIn))
   Loop, Parse, SortIn, `n
      SortOut .= SubStr(A_LoopField, InStr(A_LoopField, BS) + 1) . "`n"
   Return RTrim(SortOut, "`r`n")
}

newpieSort(item1, item2)
{
	date1 := StrSplit(item1, ",")[6]
	date2 := StrSplit(item2, ",")[6]
	return date1 > date2 ? 1 : date1 < date2 ? -1 : 0
}
art
Posts: 22
Joined: 01 Jan 2014, 05:56

Re: Sort CSV Date Column

05 Jun 2016, 03:36

This is a little faster than just me's and with more options.
Note1: my function SortByColumn() sorts by SortColumn and all-after SortColumn and i think this is better sorting.
Note2: The iterations are 100.000 so be patient to finish.

Code: Select all

#NoEnv
SetBatchLines, -1
 
Var =
(
foo6,bar8,foo2,bar7,foo3,2016/05/30,bar2
foo3,bar7,foo5,bar2,foo6,2016/05/30,bar1

foo1,bar2,foo4,bar5,foo2,2016/05/27,bar2
foo8,bar5,foo6,bar4,foo4,2016/05/30,bar4
foo7,bar4,foo3,bar6,foo3,2016/05/30,bar3

foo2,bar6,foo1,bar8,foo2,2016/05/27,bar1
foo5,bar3,foo8,bar1,foo6,2016/05/30,bar5
foo4,bar1,foo7,bar3,foo8,2016/05/23, bar
)


I := 100000

S := A_TickCount
Loop, %I% {
   V := Var
   Sorted := SortByColumn(V, 6)
}
T1 := A_TickCount - S
MsgBox, 4096, SortByColumn, %T1% ms for %I% iterations:`n`n%Sorted%

S := A_TickCount
Loop, %I% {
   V := Var
   Sorted := ColumnSort(V, 6)
}
T2 := A_TickCount - S
MsgBox, 4096, ColumnSort, %T2% ms for %I% iterations:`n`n%Sorted%

S := A_TickCount
Loop, %I% {
   Sorted := Var
   Sort, Sorted, F newpieSort
}
T3 := A_TickCount - S
MsgBox, 4096, newpieSort, %T3% ms for %I% iterations:`n`n%Sorted%

MsgBox, 4096, FINAL for %I% iterations, SortByColumn:%A_Tab%%T1% ms`nColumnSort:%A_Tab%%T2% ms`nnewpieSort:%A_Tab%%T3% ms

ExitApp

SortByColumn(Str, SortCol, SortOptions="", ColSep="`,", RowSep="`n", RowOmitChars="") {
    IfEqual,SortCol,1,Sort,Str,D%RowSep% %SortOptions%
    IfEqual,SortCol,1,Return Str
    BS:=Chr(8), AA:=BB:=""
    Loop, parse, Str, %RowSep%, %RowOmitChars%
        p:=InStr(A_LoopField,ColSep,false,1,SortCol-1)
        ,AA.=SubStr(A_LoopField,(p=0?1:p+1)) BS A_LoopField RowSep
    Sort, AA, D%RowSep% %SortOptions%
    Loop, parse, AA, %RowSep%
        BB.=SubStr(A_LoopField,InStr(A_LoopField,BS)+1) RowSep
    Return RTrim(BB, RowSep)
}

ColumnSort(Var, Column, Separator := ",", SortOptions := "") {
   Static BS := Chr(8)
   VarSetCapacity(SortIn, StrLen(Var) * 2)
   Loop, Parse, Var, `n
      SortIn .= StrSplit(A_LoopField, Separator)[Column] . BS . A_LoopField . "`n"
   Sort, SortIn, %SortOptions%
   VarSetCapacity(SortOut, StrLen(SortIn))
   Loop, Parse, SortIn, `n
      SortOut .= SubStr(A_LoopField, InStr(A_LoopField, BS) + 1) . "`n"
   Return RTrim(SortOut, "`r`n")
}

newpieSort(item1, item2)
{
	date1 := StrSplit(item1, ",")[6]
	date2 := StrSplit(item2, ",")[6]
	return date1 > date2 ? 1 : date1 < date2 ? -1 : 0
}
20160609 - My function SortByColumn() Edited: RTrim(BB, "`r`n") Fixed to RTrim(BB, RowSep)
Here is the new Function:

Code: Select all

SortByColumn(Str, SortCol, SortOptions="", ColSep="`,", RowSep="`n", RowOmitChars="") {
    IfEqual,SortCol,1,Sort,Str,D%RowSep% %SortOptions%
    IfEqual,SortCol,1,Return Str
    BS:=Chr(8), AA:=BB:=""
    Loop, parse, Str, %RowSep%, %RowOmitChars%
        p:=InStr(A_LoopField,ColSep,false,1,SortCol-1)
        ,AA.=SubStr(A_LoopField,(p=0?1:p+1)) BS A_LoopField RowSep
    Sort, AA, D%RowSep% %SortOptions%
    Loop, parse, AA, %RowSep%
        BB.=SubStr(A_LoopField,InStr(A_LoopField,BS)+1) RowSep
    Return RTrim(BB, RowSep)
} ; sorts by SortCol and all-after SortCol https://autohotkey.com/boards/viewtopic.php?f=5&t=18643&p=90570#p90570
Last edited by art on 09 Jun 2016, 04:13, edited 2 times in total.
User avatar
atnbueno
Posts: 89
Joined: 12 Oct 2013, 04:45
Contact:

Re: Sort CSV Date Column

05 Jun 2016, 05:37

Really nice improvement, art :thumbup:

I've tried the three versions with a longer list (79K lines, 4MB string) and art's (1x) is significantly better than just me's (2.8x), which still is better than mine (4.2x).

Antonio B.

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Descolada, Google [Bot] and 99 guests