XML to a flat TXT File ? Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
User avatar
jmone
Posts: 25
Joined: 30 Nov 2015, 20:43

XML to a flat TXT File ?

Post by jmone » 20 Jan 2021, 23:03

I'd like to modernise my code but need some help :) in converting an XML file into a TXT File that is Tab Delimited (and UTF)

Sample XML

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<MPL Version="2.0" Title="MCWS - Files - 33876" PathSeparator="\">
<Item>
<Field Name="FName1">123456</Field>
<Field Name="FName2">abcdef</Field>
<Field Name="FName3">qwerty</Field>
</Item>
<Item>
<Field Name="FName1">654321</Field>
<Field Name="FName2">fedcba</Field>
<Field Name="FName4">[email protected]#$%^</Field>
</Item>
</MPL>
Desired TXT (note the empty fields)

Code: Select all

FName1	FName2	FName3	FName4
123456	abcdef	qwerty
654321	fedcba			[email protected]#$%^
At present I run a couple of nested Loops with regex to find all the Unique "Field Names" in the entire file, and then for each "Item" the data it holds in the corresponding delimited field. Finally I write the whole thing out to a file. It works but it is too ugly for me to post :)

So.... what the better way?

I can load the XML doc using

Code: Select all

xmlPath := "test.mpl"
xmlDoc := ComObjCreate("MSXML2.DOMDocument.6.0")
xmlDoc.async := false
xmlDoc.load(xmlPath)
...and so far I've worked out I can get a long list of all the data using:

Code: Select all

for item in xmlDoc.getElementsByTagName("Field") {
    Text := item.text
    MsgBox % Text
}
...but now I'm unsure how to parse/loop though the data to get the data I need in the correct order to flatten the XML to a TXT file.

Any hints would be a great help!
Thanks
Nathan
User avatar
mikeyww
Posts: 5237
Joined: 09 Sep 2014, 18:38

Re: XML to a flat TXT File ?

Post by mikeyww » 21 Jan 2021, 06:20

Here is a script that gets the names as well as the values.

https://www.autohotkey.com/boards/viewtopic.php?style=17&p=282978#p282978

You could create an array with all of the field names. First, you loop through that array and display each field name. Second, if you then have an array of the items in your XML, you can loop through each item. For each item, you loop through each field name. For each field name, you display something like item[n][fieldName], which would be the field's value for the nth item.
User avatar
jmone
Posts: 25
Joined: 30 Nov 2015, 20:43

Re: XML to a flat TXT File ?

Post by jmone » 21 Jan 2021, 17:22

Thanks, that got me a bit further but it is the XML Syntax I'm unfamiliar with.

This works in getting the Value of every Item .... but it only returns the Field Name for the first 3 Fields (eg only those in first Item), the following 3 Field Names (from the second Item) are Blank but the correct Values are displayed.

Code: Select all

Loop, % xmldoc.selectNodes("//MPL/Item/Field").length {
	MsgBox % xmldoc.selectSingleNode("//MPL/Item/Field[" A_Index "]").getAttribute("Name") ": " xmldoc.selectSingleNode("//MPL/Item/Field[" A_Index "]").text
}
This works perfectly to get the Values of every Item, but I need to also add the Field Name

Code: Select all

for Item in xmlDoc.getElementsByTagName("Field") {
    Text := item.text
    MsgBox % Text
}
I guess if item.text pulls out the values what pulls out the Field Name (it's not Name.text !)
User avatar
mikeyww
Posts: 5237
Joined: 09 Sep 2014, 18:38

Re: XML to a flat TXT File ?

Post by mikeyww » 21 Jan 2021, 20:44

Code: Select all

file = %TEMP%\test.txt
FileRead, str, %file%
xmlObj := new xml(str)
For k1, v1 in xmlObj.getChildren("//MPL", "element")
 For k2, v2 in xmlObj.getChildren(v1, "element")
  MsgBox, % v2.getAttribute("Name") ": " v2.text
#Include xml.ahk ; https://github.com/denolfe/AutoHotkey/blob/master/lib/xml.ahk
User avatar
jmone
Posts: 25
Joined: 30 Nov 2015, 20:43

Re: XML to a flat TXT File ?

Post by jmone » 21 Jan 2021, 23:57

Thanks very much for you time and effort! I really appreciate it. The xml library (looks very powerful) and example you posted works and works well, though it feels like I'm using a sledge hammer to crack a nut.

My existing parsing code just uses a two lines of regex (in loops), one of each to read the Values and Field Names. I was hoping that the XML tools would be better (faster, simpler, etc)..... but unless I can find something just as simple I may as well just use regex :( .

To Get Field Value = RegExMatch(A_LoopField,"(?<="">)(.*)(?=</Field)", MC_Value) <==or==> item.Text :thumbup:
To Get Field Name = RegExMatch(A_LoopField,"(?<=Name="")(.*)(?="">)", MC_Field_Name) <==or==> ????? :problem:
just me
Posts: 7741
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: XML to a flat TXT File ?  Topic is solved

Post by just me » 22 Jan 2021, 05:54

Code: Select all

; ======================================================================================================================
; XPATH syntax -> msdn.microsoft.com/en-us/library/ms256471(v=vs.110).aspx
; XML node class -> msdn.microsoft.com/en-us/library/system.xml.xmlnode(v=vs.110).aspx
; DOM reference -> msdn.microsoft.com/en-us/library/ms764730(v=vs.85).aspx
; ======================================================================================================================
#NoEnv
XML_Data =
(Join`r`n %
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<MPL Version="2.0" Title="MCWS - Files - 33876" PathSeparator="\">
<Item>
<Field Name="FName1">123456</Field>
<Field Name="FName2">abcdef</Field>
<Field Name="FName3">qwerty</Field>
</Item>
<Item>
<Field Name="FName1">654321</Field>
<Field Name="FName2">fedcba</Field>
<Field Name="FName4">[email protected]#$%^</Field>
</Item>
</MPL>
)
; ----------------------------------------------------------------------------------------------------------------------
XML_Obj := ComObjCreate("Msxml2.DOMDocument.6.0")
XML_Obj.setProperty("SelectionLanguage", "XPath")
XML_Obj.async := False
; XML_Path := "test.mpl"
; XML_Obj.load(XML_Path)
XML_Obj.loadXML(XML_Data)
XML_Doc := XML_Obj.documentElement
; ----------------------------------------------------------------------------------------------------------------------
NamesArr := {}
For Node In XML_Doc.selectNodes("Item/Field/@Name")
   NamesArr[Node.Value] := ""
NamesCount := NamesArr.Count()
; ----------------------------------------------------------------------------------------------------------------------
ResultStr := ""
For Name In NamesArr
   ResultStr .= Name . (A_Index < NamesCount ? "`t" : "")
For Item In XML_Doc.selectNodes("Item") {
   TextStr := ""
   For Name In NamesArr {
      Txt := Item.selectSingleNode("Field[@Name='" . Name . "']").text
      TextStr .= Txt . (A_Index < NamesCount ? "`t" : "")
   }
   ResultStr .= "`n" . TextStr
}
MsgBox, %ResultStr%
User avatar
jmone
Posts: 25
Joined: 30 Nov 2015, 20:43

Re: XML to a flat TXT File ?

Post by jmone » 22 Jan 2021, 17:36

@just me !! Thank you, this is exactly what I was hope to achieve, it is Concise, Smart, Elegant & Fast - It just works :)

I ran it over an XML with 30K+ items and the only issue I've found is that as some of the Field Data has Carriage Retunes in them it destroys the formatting as it creates a new line in the output table at that point.

Sample of the XML (see FName 3 data)

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<MPL Version="2.0" Title="MCWS - Files - 46968" PathSeparator="\">
<Item>
<Field Name="FName1">123456</Field>
<Field Name="FName2">abcdef</Field>
<Field Name="FName3">Line1, 
Line2, 
Line 3
</Field>
<Field Name="FName4">xyz123</Field>

</Item>
</MPL>
I can "fix" this by replacing the embeded "`n" with text such as "{LF}" and the output is now correct, but is there a more elegant way?

Code: Select all

#NoEnv

; ----------------------------------------------------------------------------------------------------------------------
XML_Obj := ComObjCreate("Msxml2.DOMDocument.6.0")
XML_Obj.setProperty("SelectionLanguage", "XPath")
XML_Obj.async := False
XML_Path := "test.mpl"
XML_Obj.load(XML_Path)
XML_Doc := XML_Obj.documentElement
; ----------------------------------------------------------------------------------------------------------------------
NamesArr := {}
For Node In XML_Doc.selectNodes("Item/Field/@Name")
   NamesArr[Node.Value] := ""
NamesCount := NamesArr.Count()
; ----------------------------------------------------------------------------------------------------------------------
ResultStr := ""
For Name In NamesArr
   ResultStr .= Name . (A_Index < NamesCount ? "`t" : "")
For Item In XML_Doc.selectNodes("Item") {
   TextStr := ""
   For Name In NamesArr {
      Txt := Item.selectSingleNode("Field[@Name='" . Name . "']").text
      TextStr .= Txt . (A_Index < NamesCount ? "`t" : "")
   }
   ; --------- Swap out Special Chars that break the TXT File ----------
   StringReplace, TextStr, TextStr, `n, {LF}, All
   ResultStr .= "`n" . TextStr
}

FileAppend, %ResultStr%, Test.txt, UTF-16
User avatar
jmone
Posts: 25
Joined: 30 Nov 2015, 20:43

Re: XML to a flat TXT File ?

Post by jmone » 22 Jan 2021, 20:22

I've tied a few more ideas:
- Different File Encodings: UFT-8, UTF-8RAW, UTF-16, UTF-16RAW
- Wrapping the "Txt" variable in "" (which I think is the "correct way" when there is internal LF etc)

...and then trying different ways of opening it in Excel.

I think this is more of an issue with how Excel treats the TXT File when I either :
1) Importing the File (using the File Open --> Text Import Wizard --> Step 3 "Column data format = Text"): The data is correctly imported (well some rounding on long integers), but the embedded LF beak up the table. On Step 2, the Text qualifier is set to " but that does not seem to help
2) Drag and Drop the TXT File into Excel: The internal LF are correctly handled with all of the data it in one cell, but unfortunately, Excel has now auto imported some of the cells as "General" instead of "Text" so #'s are changed to stuff like date formed display.
Post Reply

Return to “Ask For Help”