parse a text file to csv and then sort by column Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
Geramont
Posts: 13
Joined: 27 Oct 2017, 01:17

parse a text file to csv and then sort by column

03 Nov 2018, 11:24

Hi everyone,

I am trying to parse an xml file to csv. I then want to sort this csv file by one of the columns and display that in notepad. The xml file looks like this:

Code: Select all

<?xml version="1.0" encoding="utf-16"?>
<Searches>
	<Search Name="Billy, Jean (#1234)" Id="15" Date="31.07.2018 11:17:00" SearchString="1234"/>
	...
</Searches>
(I have no idea why one chose to use utf-16 and not to set seperate tags for Name, Date etc. and put everything in just one tag...)
What I want to get as a file is this:

Code: Select all

Billy, Jean (#1234)	15	31.07.2018 11:17:00	1234
Doe, John (#1244)	23	22.06.2018 09:12:33	1244
...
I managed to delete the first two lines and the last line of the xml file and to strip the <Search> tag. Right now my output file looks like this:

Code: Select all

Name="Billy, Jean (#1234)" Id="15" Date="31.07.2018 11:17:00" SearchString="1234"
How do I get rid of
Name="
Id="
Date="
SearchString="
and the ending double quotes?
And how can I sort my file by colums, e. g. Name or Date?

Actually my AHK code looks like this:

Code: Select all

#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.

in_file = %A_Desktop%\search.xml
out_file = %A_Desktop%\search.txt

ifexist,%out_file%
	FileDelete, %out_file%

FileRead, file,%in_file%
if ErrorLevel
{
	MsgBox,4144,Warning,Failed to load`n`n%in_file%`n`nExit Application.
	ExitApp
}

;Loop, Read, %in_file%
;{
	;total_lines = %A_Index%
;}
;MsgBox, Total number of lines in xml file: %total_lines%


loop parse, file, `n

{
	while position := RegExMatch(file, "(?|(<\w*\s\w*=")|("\s\w*=")|("\/>))", match_pattern, A_Index=1?1:pos+StrLen(match_pattern))
	{
		result .= match_pattern "`n"
		MsgBox % result		
	}
}

FileAppend, %result%, out_file
;MsgBox % result
Run notepad %out_file%

ExitApp
^x::
ExitApp
Here the last

Code: Select all

loop parse, file, `n
is not working. Tried to get rid of Name=" etc. there.

I then tried to get rid of these by testing this code using regex:

Code: Select all

#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.

; ------------------------------------------------------------------------------
; parse xml file and convert to csv, then sort by date and display sorted file
; ------------------------------------------------------------------------------

file_xml = %A_Desktop%\search.xml
out_file = %A_Desktop%\search.txt

ifexist,%out_file%
{
	FileDelete, %out_file%	
	Sleep, 500
}

FileRead, in_file,%file_xml%
if ErrorLevel
{
	MsgBox,4144,Warning,Failed to load`n`n%file%`n`nExit Application.
	ExitApp
}


; ------------------------------------------------------------------------------
; manipulate xml file
; ------------------------------------------------------------------------------

Loop, Read, %file_xml%
{
	total_lines = %A_Index%
}


loop parse, in_file, `n
{
	if (A_Index = 1 or A_Index = 2 or A_Index = total_lines)
	{
		line = ""
	}
	
	else if (A_Index = total_lines -1) ;last line without cr
	{
		line := Trim(A_LoopField)
		FileAppend, %line%, %out_file%
	}
	
	else
	{
		;line .= A_LoopField . "`n"
		;MsgBox,4144,,%line%
		line := Trim(A_LoopField) . "`n"
		FileAppend, %line%, %out_file%
	}
}


; ------------------------------------------------------------------------------
; manipulate out_file
; ------------------------------------------------------------------------------

total_lines = ""
FileRead, in_file, %out_file%

if ErrorLevel
{
	MsgBox,4144,Warning,Failed to load`n`n%file%`n`nExit Application.
	ExitApp
}


Loop, Read, %out_file%
{
	total_lines = %A_Index%
}


ifexist,%out_file%
{
	FileDelete, %out_file%
	Sleep, 500
}


loop parse, in_file, `n
{
	if (A_Index < total_lines)
	{
		StringTrimLeft, line, A_LoopField, 21
		StringTrimRight, line, line, 3
		line .= "`n"
		FileAppend, %line%, %out_file%
	}
	
	else ;(A_Index = total_lines)
	{
		StringTrimLeft, line, A_LoopField, 21
		StringTrimRight, line, line, 3
		FileAppend, %line%, %out_file%
	}
	
	line := RegExReplace(line, "(\w*=")")
	line := RegExReplace(line, "("\s+)", "\t")
}

; ------------------------------------------------------------------------------
; To do
; ------------------------------------------------------------------------------
; _extract every string between double quotes, delet rest of line
; _convert every 2nd double quote to a comma, delet all remaing double quotes. 
; Thus generate a csv file
;_sort csv file by time stamp or name etc.
; ------------------------------------------------------------------------------

Run notepad %out_file%
ExitApp

^x::
ExitApp
I tried with regex but it won't work. Tested the regex on regex101.com so the regex itself seems to work. Not my impementation with AHK...May be someone wants to give a rookie a hand? That would be very nice.

Thanks a lot!
swagfag
Posts: 3003
Joined: 11 Jan 2017, 17:59

Re: parse a text file to csv and then sort by column  Topic is solved

03 Nov 2018, 20:26

Code: Select all

xml = 
(Join`r`n
<?xml version="1.0" encoding="utf-16"?>
<Searches>
<Search Name="Two" Id="2" Date="02.07.2018 11:17:00" SearchString="2"/>
<Search Name="Five" Id="5" Date="05.07.2018 11:17:00" SearchString="5"/>
<Search Name="One" Id="1" Date="01.07.2018 11:17:00" SearchString="1"/>
<Search Name="Four" Id="4" Date="04.07.2018 11:17:00" SearchString="4"/>
<Search Name="Three" Id="3" Date="03.07.2018 11:17:00" SearchString="3"/>
</Searches>
)

xml := RegExReplace(xml, "<\?xml version=""1.0"" encoding=""utf-16""\?>\s*") ;delete this
xml := RegExReplace(xml, "<Searches>\s*") ;also this
xml := RegExReplace(xml, "</Searches>\s*") ;and this

Sort xml, F DateSort
DateSort(a, b) {
	static NEEDLE := ".*Date=""(\d{1,2})\.(\d{1,2})\.(\d{4}) (\d{2}):(\d{2}):(\d{2}).*"
		, YYYYMMDDHHMISS := "$3$2$1$4$5$6"

	a := RegExReplace(a, NEEDLE, YYYYMMDDHHMISS)
	b := RegExReplace(b, NEEDLE, YYYYMMDDHHMISS)

	if (a > b)
		return 1
	else if (a < b)
		return -1
	else
		return 0 
}

MsgBox % xml

Loop Parse, xml, `n, `r
{
	RegExMatch(A_LoopField, "Name=""\K[^""]*", name)
	RegExMatch(A_LoopField, "Id=""\K[^""]*", id)
	RegExMatch(A_LoopField, "Date=""\K[^""]*", date)
	RegExMatch(A_LoopField, "SearchString=""\K[^""]*", string)

	TSV .= name "`t" id "`t" date "`t" string  "`n"
}

MsgBox % TSV
i wouldnt run this on mega huge files
garry
Posts: 1892
Joined: 22 Dec 2013, 12:50

Re: parse a text file to csv and then sort by column

04 Nov 2018, 11:08

thank you swagfag , datesort is not easy
just a short basic example without sort and format

Code: Select all

e=
(ltrim,join`r`n
Name="Billy, Jean1 (#1234)" Id="15" Date="01.07.2018 11:17:00" SearchString="1234"
Name="Cilly, Jean2 (#1234)" Id="16" Date="02.07.2018 11:17:00" SearchString="1233"
Name="Dilly, Jean3 (#1234)" Id="17" Date="03.07.2018 11:17:00" SearchString="1232"
Name="Ailly, Jean4 (#1234)" Id="18" Date="04.07.2018 11:17:00" SearchString="1231"
)

;- if these always same :
 a= ="                     ;- begin
 b= "                      ;- end

loop,parse,e,`n,`r
 {
 c:= a_loopfield
 if c=
   continue
 loop,
  {
  if c=
     break
  new=
  StringGetPos,P1,c,%a%             ;- position a
  P1:=P1+2
  if (p1=1)
      break
  stringtrimleft,c2,c,p1            ;- after found cut rest from fulltext
  c=%c2%
  StringGetPos,P2,c2,%b%            ;- until position b
  StringMid,new,c2,1,P2
  new=%new%
  e2 .= new . " `|"                 ;- this delimiter ( | ) should not exist in column 
  }
 stringtrimright,e2,e2,1   
 e3 .= e2 . "`r`n"
 e2=
 }
f1=%a_scriptdir%\test55.csv
ifexist,%f1%
  filedelete,%f1% 
fileappend,%e3%,%f1%
run,%f1%
e=
exitapp
esc::exitapp
;=====================================================
Geramont
Posts: 13
Joined: 27 Oct 2017, 01:17

Re: parse a text file to csv and then sort by column

07 Nov 2018, 11:17

Hi guis,

thank you soooo much for your help!
@swagfag: especially to you. Your example using sort was incredible. Struggled a little bit to understand your special use but learned a lot(!) on how I can use it and how I can even create my own sorting function + regex! Thanks!

@garry: Thank you for your "conventional" example! I have learned a lot on using parsing loops and string manipulation.

Thank you both for your time and effort to help me. The script now works exactly as I wanted it to, even with descending sort.

Thanks so much!
User avatar
bitx0r
Posts: 21
Joined: 05 Oct 2014, 12:30
GitHub: errorseven
Location: NorCal
Contact:

Re: parse a text file to csv and then sort by column

08 Nov 2018, 03:05

Late to the party.

While the RegEx Solution works, I'd like to show you that there's a much simpler method to achieve this by using a COM Object XML parser, I had to google a little about XML DOM, took only few minutes to figure out how to grab Attributes, no RegEx required in parseing the XML. I did borrow @SwagFag's DateSort() function, modified slightly to handle clean Dates.

Code: Select all

xml =
(
<?xml version="1.0" encoding="utf-16"?>
<Searches>
<Search Name="Two" Id="2" Date="02.07.2018 11:17:00" SearchString="2"/>
<Search Name="Five" Id="5" Date="05.07.2018 11:17:00" SearchString="5"/>
<Search Name="One" Id="1" Date="01.07.2018 11:17:00" SearchString="1"/>
<Search Name="Four" Id="4" Date="04.07.2018 11:17:00" SearchString="4"/>
<Search Name="Three" Id="3" Date="03.07.2018 11:17:00" SearchString="3"/>
</Searches>
)

doc :=	loadXML(xml)

msgbox % DisplayNode(doc.childNodes)

DisplayNode(nodes) {
    for node in nodes {
        if node.nodeName == "Search" {
            for attribs in node.attributes 
                text .= attribs.nodeValue "`t"          
            text := trim(text, "`t") "`n"
        }      
        if node.hasChildNodes
            text .=	DisplayNode(node.childNodes)
    }
    Sort text, F DateSort
    return text
}

loadXML(ByRef data) {
    o :=	ComObjCreate("MSXML2.DOMDocument.6.0")
    o.async :=	false
    o.loadXML(data)
    return	o
}


DateSort(a, b) {
    static NEEDLE := ".*\s(\d{1,2})\.(\d{1,2})\.(\d{4}) (\d{2}):(\d{2}):(\d{2}).*", YYYYMMDDHHMISS := "$3$2$1$4$5$6"
    a := RegExReplace(a, NEEDLE, YYYYMMDDHHMISS)
    b := RegExReplace(b, NEEDLE, YYYYMMDDHHMISS)
    return a > b ? 1 : a < b ? -1 : 0
}
Last edited by bitx0r on 08 Nov 2018, 17:38, edited 1 time in total.
garry
Posts: 1892
Joined: 22 Dec 2013, 12:50

Re: parse a text file to csv and then sort by column

08 Nov 2018, 14:19

@bitx0r , the datesort seems not correct
User avatar
bitx0r
Posts: 21
Joined: 05 Oct 2014, 12:30
GitHub: errorseven
Location: NorCal
Contact:

Re: parse a text file to csv and then sort by column

08 Nov 2018, 16:20

garry wrote:
08 Nov 2018, 14:19
@bitx0r , the datesort seems not correct
I'm getting the exact same results.

Code: Select all

xml =
(Join`r`n
<?xml version="1.0" encoding="utf-16"?>
<Searches>
<Search Name="Two" Id="2" Date="02.07.2018 11:17:00" SearchString="2"/>
<Search Name="Five" Id="5" Date="05.07.2018 11:17:00" SearchString="5"/>
<Search Name="One" Id="1" Date="01.07.2018 11:17:00" SearchString="1"/>
<Search Name="Four" Id="4" Date="04.07.2018 11:17:00" SearchString="4"/>
<Search Name="Three" Id="3" Date="03.07.2018 11:17:00" SearchString="3"/>
</Searches>
)
xml := RegExReplace(xml, "<\?xml version=""1.0"" encoding=""utf-16""\?>\s*") ;delete this
xml := RegExReplace(xml, "<Searches>\s*") ;also this
xml := RegExReplace(xml, "</Searches>\s*") ;and this
Sort xml, F DateSort
DateSort(a, b) {
static NEEDLE := ".*Date=""(\d{1,2})\.(\d{1,2})\.(\d{4}) (\d{2}):(\d{2}):(\d{2}).*"
, YYYYMMDDHHMISS := "$3$2$1$4$5$6"
a := RegExReplace(a, NEEDLE, YYYYMMDDHHMISS)
b := RegExReplace(b, NEEDLE, YYYYMMDDHHMISS)
if (a > b)
return 1
else if (a < b)
return -1
else
return 0
}

Loop Parse, xml, `n, `r
{
RegExMatch(A_LoopField, "Name=""\K[^""]*", name)
RegExMatch(A_LoopField, "Id=""\K[^""]*", id)
RegExMatch(A_LoopField, "Date=""\K[^""]*", date)
RegExMatch(A_LoopField, "SearchString=""\K[^""]*", string)
TSV .= name "`t" id "`t" date "`t" string "`n"
}

xml =
(
<?xml version="1.0" encoding="utf-16"?>
<Searches>
<Search Name="Two" Id="2" Date="02.07.2018 11:17:00" SearchString="2"/>
<Search Name="Five" Id="5" Date="05.07.2018 11:17:00" SearchString="5"/>
<Search Name="One" Id="1" Date="01.07.2018 11:17:00" SearchString="1"/>
<Search Name="Four" Id="4" Date="04.07.2018 11:17:00" SearchString="4"/>
<Search Name="Three" Id="3" Date="03.07.2018 11:17:00" SearchString="3"/>
</Searches>
)

doc :=	loadXML(xml)

csv := DisplayNode(doc.childNodes)

DisplayNode(nodes) {
    for node in nodes {
        if node.nodeName == "Search" {
            for attribs in node.attributes 
                text .= attribs.nodeValue "`t"          
            text := trim(text, "`t") "`n"
        }      
        if node.hasChildNodes
            text .=	DisplayNode(node.childNodes)
    }
    Sort text, F DateSort2
    return text
}

loadXML(ByRef data) {
    o :=	ComObjCreate("MSXML2.DOMDocument.6.0")
    o.async :=	false
    o.loadXML(data)
    return	o
}


DateSort2(a, b) {
    static NEEDLE := ".*(\d{1,2})\.(\d{1,2})\.(\d{4}) (\d{2}):(\d{2}):(\d{2}).*", YYYYMMDDHHMISS := "$3$2$1$4$5$6"
    a := RegExReplace(a, NEEDLE, YYYYMMDDHHMISS)
    b := RegExReplace(b, NEEDLE, YYYYMMDDHHMISS)
    return a > b ? 1 : a < b ? -1 : 0
}

results := "`t`tSwagFag`n`n" TSV "`t`tbitx0r`n`n" csv

size := A_ScreenWidth // 2
Gui, Add, Edit, r20 vMyEdit w%size%, %results%
Gui, Show
Results:

Code: Select all

		SwagFag

One	1	01.07.2018 11:17:00	1
Two	2	02.07.2018 11:17:00	2
Three	3	03.07.2018 11:17:00	3
Four	4	04.07.2018 11:17:00	4
Five	5	05.07.2018 11:17:00	5
			
		bitx0r

One	1	01.07.2018 11:17:00	1
Two	2	02.07.2018 11:17:00	2
Three	3	03.07.2018 11:17:00	3
Four	4	04.07.2018 11:17:00	4
Five	5	05.07.2018 11:17:00	5

garry
Posts: 1892
Joined: 22 Dec 2013, 12:50

Re: parse a text file to csv and then sort by column

08 Nov 2018, 16:47

maybe problem with days ( tried with other date )

Code: Select all

bitx0r
Five	5	11.07.2018 11:17:00	5
Four	4	11.07.2018 12:17:00	4
Three	3	03.07.2018 11:17:00	3
Two 	2	22.11.2018 11:17:00	2
One 	1	12.12.2018 11:17:00	1
;----------------------------------------------
swagfag
Three	3	03.07.2018 11:17:00	3
Five	5	11.07.2018 11:17:00	5
Four	4	11.07.2018 12:17:00	4
Two  	2	22.11.2018 11:17:00	2
One 	1	12.12.2018 11:17:00	1
User avatar
bitx0r
Posts: 21
Joined: 05 Oct 2014, 12:30
GitHub: errorseven
Location: NorCal
Contact:

Re: parse a text file to csv and then sort by column

08 Nov 2018, 17:30

garry wrote:
08 Nov 2018, 16:47
maybe problem with days ( tried with other date )

Code: Select all

bitx0r
Five	5	11.07.2018 11:17:00	5
Four	4	11.07.2018 12:17:00	4
Three	3	03.07.2018 11:17:00	3
Two 	2	22.11.2018 11:17:00	2
One 	1	12.12.2018 11:17:00	1
;----------------------------------------------
swagfag
Three	3	03.07.2018 11:17:00	3
Five	5	11.07.2018 11:17:00	5
Four	4	11.07.2018 12:17:00	4
Two  	2	22.11.2018 11:17:00	2
One 	1	12.12.2018 11:17:00	1
The main point of my Post was to show you a less brute force manner to Parse XML, I just threw in the SwagFags function to make it a complete solution, but since I broke it...

Replace this line:

Code: Select all

static NEEDLE := ".*\s(\d{1,2})\.(\d{1,2})\.(\d{4}) (\d{2}):(\d{2}):(\d{2}).*"
Results:

Code: Select all

		SwagFag

Three	3	03.07.2018 11:17:00	3
Five	5	11.07.2018 11:17:00	5
Four	4	11.07.2018 11:17:00	4
Two	2	22.11.2018 11:17:00	2
One	1	12.12.2018 11:17:00	1
			
		bitx0r

Three	3	03.07.2018 11:17:00	3
Four	4	11.07.2018 11:17:00	4
Five	5	11.07.2018 11:17:00	5
Two	2	22.11.2018 11:17:00	2
One	1	12.12.2018 11:17:00	1
garry
Posts: 1892
Joined: 22 Dec 2013, 12:50

Re: parse a text file to csv and then sort by column

09 Nov 2018, 05:02

@bitx0r , thank you for your script /solution

Return to “Ask For Help”

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], pythonker, Rohwedder, takayo97 and 230 guests