 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
Predated
Joined: 06 Nov 2007 Posts: 42
|
Posted: Sun Feb 03, 2008 9:32 pm Post subject: |
|
|
I'm confused here...
I'm looping through (severely malformed) html files and using tidy to create valid xml files. Once that's done, I need to add 1 style definition. The first loop below does the tidy bit, the second loop uses xpath to add the style declaration.
The log files are put there by another script, it just copies them from another directory.
If I run the script below with a MsgBox or Sleep in the second loop, right before I load the xml file, the script completes without error, and I get the correct output in my xml file. Below the script is an example of an 'xml' file generated by tidy from an html file.
| Code: | logDir := A_ScriptDir . "\logs\jabber\john2@itcsc-home" ;log file directory for Tracks
/*
* Loop throught the log directory and create a comma seperated list of
* file paths.
*/
fl =
Loop %logDir%\*,0,1
{
fl = %fl%%A_LoopFileFullPath%`,
Run C:\MyData\ahk\PidginTracks\1.0.1\Tools\tidy.exe -clean --indent auto --output-xml yes --add-xml-decl yes --wrap 72 --quote-marks yes --quote-nbsp yes --doctype auto -m %A_LoopFileFullPath%, %A_LoopFileDir%, Hide
}
tf = 0
StringSplit fa, fl, `,
Loop %fa0%
{
tf := fa%A_Index%
xpath_load(xml, tf)
x := xpath(xml, "/html/head/style/text()")
xpath(xml, "/html/head/style/text()", x . "body {font-family: arial; font-size: x-small}")
xpath_save(xml, tf)
}
Return |
I'm adding the body font declaration. The span styles are automatically added by tidy.
| Code: | <?xml version="1.0" encoding="iso-8859-1"?>
<html>
<head>
<meta name="generator"
content="HTML Tidy for Windows (vers 14 February 2006), see www.w3.org" />
<meta http-equiv="content-type"
content="text/html; charset=us-ascii" />
<title>Conversation with john2@itcsc-home at 2/1/2008 6:37:40 AM on
admin@itcsc-home/Work (jabber)</title>
<style type="text/css">
span.c2 {color: #A82F2F}
span.c1 {font-size: 80%}
body {font-family: arial; font-size: x-small}</style>
</head>
<body>
<h3>Conversation with john2@itcsc-home at 2/1/2008 6:37:40 AM on
admin@itcsc-home/Work (jabber)</h3>
<span class="c2">
<span class="c1">(06:37:40)</span>
<b>John:</b>
</span> test
<br />
<span class="c2">
<span class="c1">(06:37:44)</span>
<b>John:</b>
</span> test
<br />
<span class="c2">
<span class="c1">(06:37:45)</span>
<b>John:</b>
</span> test
<br />
</body>
</html> |
|
|
| Back to top |
|
 |
Titan
Joined: 11 Aug 2004 Posts: 5107 Location: eth0
|
Posted: Sun Feb 03, 2008 10:16 pm Post subject: |
|
|
| Predated wrote: | | If I run the script below with a MsgBox or Sleep in the second loop, right before I load the xml file, the script completes without error, and I get the correct output in my xml file. | The output you posted looks correct, if I overlooked something please highlight it. Using a Sleep/MsgBox should make no difference as AutoHotkey is single-threaded. Maybe there is something else that causes conflict under certain conditions, i.e. running tidy twice. Could you post the output when it doesn't work?
| BadBoyBill wrote: | | I noticed if I remove an attribute it leaves alot of whitespace where that attribute was | Could be a regression bug - I remember fixing this before but a more recent update to attribute parsing could have re-introduced it. It'll be fixed in the next update with the move/copy functions (which has proven to be more difficult than anticipated). _________________
RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2") |
|
| Back to top |
|
 |
Predated
Joined: 06 Nov 2007 Posts: 42
|
Posted: Sun Feb 03, 2008 10:28 pm Post subject: |
|
|
| Titan wrote: | | Maybe there is something else that causes conflict under certain conditions, i.e. running tidy twice. |
If I change the Run command to RunWait, it seems to work every time - I hadn't thought about that. The sleep/msgbox "workaround" didn't make sense to me either - it was perplexing.
As far as the output when it failed, sometimes the body style declaration wasn't there, and sometimes the entire contents of the file would appear twice. It was weird
Anyway, thanks for the fresh eyes, Titan. I guess I had tunnel vision on that one. |
|
| Back to top |
|
 |
Titan
Joined: 11 Aug 2004 Posts: 5107 Location: eth0
|
Posted: Mon Feb 04, 2008 3:54 pm Post subject: |
|
|
Version 3.13b released, checkout at svn repository or download.
It includes two new functions, append() and prepend() which supplement the set parameter. Additionally the node: prefix can be used in the last parameter to use the value of the specified path. A combination of these can be used to copy nodes - to move instead you can use the remove() function afterwards.
Test:
| Code: | x =
(
<recipe name="bread" prep_time="5 mins" cook_time="3 hours">
<title>Basic bread</title>
<ingredient amount="3" unit="cups">Flour</ingredient>
<ingredient amount="0.25" unit="ounce">Yeast</ingredient>
<ingredient amount="1.5" unit="cups" state="warm">Water</ingredient>
<ingredient amount="1" unit="teaspoon">Salt</ingredient>
<instructions>
<step>Mix all ingredients together.</step>
<step>Knead thoroughly.</step>
<step>Cover with a cloth, and leave for one hour in warm room.</step>
<step>Knead again.</step>
<step>Place in a bread baking tin.</step>
<step>Cover with a cloth, and leave for one hour in warm room.</step>
<step>Bake in the oven at 350°F for 30 minutes.</step>
</instructions>
</recipe>
)
xpath_load(x)
a = /recipe/prepend()
b = node:/recipe/instructions/step[3]
v := xpath(x, a, b)
MsgBox, % xpath_save(x) |
_________________
RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2") |
|
| Back to top |
|
 |
BadBoyBill
Joined: 24 Jan 2008 Posts: 11
|
Posted: Tue Feb 05, 2008 2:59 am Post subject: |
|
|
Hey this is great Titan,
I was trying to figure this out, could you toss up an example of moving ingredient 3 up to ingredient number 2's spot or vice versa. I've been able to move things to the beginning or end but not in between. Thanks |
|
| Back to top |
|
 |
Titan
Joined: 11 Aug 2004 Posts: 5107 Location: eth0
|
Posted: Tue Feb 05, 2008 11:34 am Post subject: |
|
|
To reduce code size I never wrote anything to reorder or rename nodes. This is also because I haven't thought of a good syntax. If you (or others) think it's necessary please let me know. _________________
RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2") |
|
| Back to top |
|
 |
BadBoyBill
Joined: 24 Jan 2008 Posts: 11
|
Posted: Tue Feb 05, 2008 5:01 pm Post subject: |
|
|
Im not sure how much others would need it, but this script combined with TVX could compliment eachother great. Ive read a several threads where people are putting xml into listviews or treeviews like myself. TVX just extends treeview to allow you to move items up or down the list. Changing the order of items in the tree is useful for me because the output xml determines the order of menu items for some flash software I'm writing. This helps the user change the menu order from the backend.
I don't want you to waste your time if I'm the only guy that needs this, so I hope some others might chime in here to show interest in this feature. |
|
| Back to top |
|
 |
schmick
Joined: 08 Jan 2008 Posts: 20 Location: Santiago - Chile -SouthAmerica - Planet Earth - Milky Way - Local Cumulus - Universe V1.0
|
Posted: Tue Feb 12, 2008 9:04 am Post subject: Using xpath to populate and save a treeview |
|
|
Hail Titan!
Hi to all the peasants also.
Well, I've been fighting with treeviews and xpath for a while, and as some have expresed that it'll be nice to have a way back and forth form xml to TreeView, well, I've made something up quite easy to follow and modify to own needs.
Ingredients:
- Xpath V3.12 or above
- A custon treeview formated XML such as | Code: | <root>
<Element>
<Text>One</Text>
<Parent>0</Parent>
<ID>545</ID>
</Element>
<Element>
<Text>Two</Text>
<Parent>0</Parent>
<ID>789</ID>
</Element>
<Element>
<Text>One.One</Text>
<Parent>545</Parent>
<ID>51</ID>
</Element>
</root> |
| Code: | #include xpath.ahk
#NoEnv
Gui, Add, TreeView, x16 y4 w220 h360 gTheTreeView
Gui, Add, Button, x256 y4 w100 h30 , Load
Gui, Add, Button, x256 y35 w100 h30 , Save
Gui, Add, StatusBar
; Generated using SmartGUI Creator 4.0
Gui, Show, x131 y91 h377 w376, New GUI Window
xpath_load(xmlin, "tree.xml")
Return
ButtonLoad:
AddToTree()
return
ButtonSave:
outputdebug parsing and saving
SaveXML(A_WorkingDir . "\xmlout.xml")
return
GuiClose:
ExitApp
return
AddToTree(XMLID=0,ParentItemID = 0)
{
global xmlin
theparents=% xpath(xmlin, "/root/Element[Parent=" . XMLID . "]/ID/text()")
Loop, parse, theparents, `,
{
thetext=% xpath(xmlin, "/root/Element[ID=" . A_LoopField . "]/Text/text()")
AddToTree(A_LoopField, TV_Add(thetext, ParentItemID, sort))
}
}
SaveXML(Filedest)
{
xml=<root />
Xpath_load(xmlout,xml)
ItemID = 0
ItemText=
ItemParent=
Loop
{
ItemID := TV_GetNext(ItemID, "Full")
if not ItemID ; No more items in tree.
break
TV_GetText(ItemText, ItemID)
ItemParent:=TV_GetParent(ItemID)
XPath(xmlout, "/root/Element[+1]/Text[+1]/text()", ItemText)
XPath(xmlout, "/root/Element[last()]/Parent[+1]/text()", ItemParent)
XPath(xmlout, "/root/Element[last()]/ID[+1]/text()", ItemID)
outputdebug %ItemText%, %ItemParent%, %ItemID%
}
xpath_save(xmlout,Filedest)
}
TheTreeView:
if A_GuiEvent = S
{
TV_GetText(SelectedItemText, A_EventInfo)
SB_SetText(A_EventInfo . "=" . SelectedItemText . " | Parent=" . TV_GetParent(A_EventInfo))
}
return
|
It reads from tree.xml and saves on xmlout.xml
The xmlout.xml can be renamed to tree.xml and it'll load in the TV.
Notice that Treeview asigns an ID of it's own, but there is no problem with IDs as the parsing doesn't consider any order or sorting as long as there is a link between parent and child.
Master (a.k.a. Titan), I'd appreciate if you could check out the code for any improovements/optimizations. I don't like calling xpath in the LOOP in the AddToTree function, but can't figure out how to get a subset outside the loop.
So ppl, I expect some feedback.
Regards from Chile and happy coding! _________________ Carlos Troncoso |
|
| Back to top |
|
 |
Titan
Joined: 11 Aug 2004 Posts: 5107 Location: eth0
|
Posted: Tue Feb 12, 2008 5:10 pm Post subject: |
|
|
That's great work schmick! BadBoyBill has been experimenting with xpath and TVX as well, and more recently Jero3n has also found a way to combine it with ListViews/LVX. XML is well suited for datasets like this and I feel honoured that my library is used to make this happen, so thank you all
In response to your question...
| schmick wrote: | | I don't like calling xpath in the LOOP in the AddToTree function, but can't figure out how to get a subset outside the loop. | After you find theparents try the following:
| Code: | Loop, Parse, theparents, `,
pred = %pred%[ID='%A_LoopField%']
msgbox % xpath(xmlin, "/root/Element" . pred . "/Text/text()") |
This will give you a list of all the Text nodes that belong to any of the specified ID's. A problem with this is that you do not know which text content belongs to which ID. There are other ways of emulating this behaviour in fewer steps but it requires the use of rawsrc()...
A side note: update v3.13c has been snuck in to the main release a few days ago. The only bug fix was a missing EOT char delimiting multiple values with rawsrc(). Since this function is not officially documented yet the html file corresponds to the API for 3.12 - but this is nothing to be concerned about. _________________
RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2") |
|
| Back to top |
|
 |
schmick
Joined: 08 Jan 2008 Posts: 20 Location: Santiago - Chile -SouthAmerica - Planet Earth - Milky Way - Local Cumulus - Universe V1.0
|
Posted: Tue Feb 12, 2008 6:40 pm Post subject: |
|
|
Thankx Titan,
Gee, I didn't even knew you could stack up preds like that.. guess I'll have to rev-engineer xpath to find out what other neat things it can do.. lol..
Man I love undocumented features, the searching for a "how to do it" is part of the adventure.
OK, I'll check out 3.13c. I usually only upgrade if it is necesary. If it ain't broken, don't fix it..
Now we have the tools... let's show ppl what sort of furniture can be built with them. Wohoo!
I'll see if i can get something done today, as I'm going jetskiing for a few weeks. My boss forced me to take vacations. He even bought the tickets.. nice guy ehh?
Cya'll! _________________ Carlos Troncoso |
|
| Back to top |
|
 |
schmick
Joined: 08 Jan 2008 Posts: 20 Location: Santiago - Chile -SouthAmerica - Planet Earth - Milky Way - Local Cumulus - Universe V1.0
|
Posted: Tue Feb 12, 2008 9:52 pm Post subject: |
|
|
Titan,
The idea was good, the result.. not so good.
this is the output string regarding the loop you stated on the previous post.
| Code: | theparents=% xpath(xmlin, "/root/Element[ID=463336][ID=477992]/Text/text()")
msgbox %theparents%
------------------
[EMPTY SET] |
while this | Code: | theparents=% xpath(xmlin, "/root/Element[ID=463336]/Text/text()")
msgbox %theparents%
------------------
Cirugia |
I guess the syntax output is not correct.
What was the expected output for the idea you intended?
BTW, I experimented with rawsrc(), would be nice to make it work with my code, but I noticed that the output had no root node. It is still xml compliant?
PD: I noticed that when there are multiple hits based on a xpath statement, you get a coma separated list of items. But if you use multiple statements separated by |, the output uses a coma to separate both outputs.
| Code: | | a1,a2,a3,a4,b1,b2,b3,b4 |
That makes difficult to split.
You might want to consider a new function, such as xpath_config(options).
This function will set xpath with options such as custom separators .
Consider xpath_config("s, g;")(, for items ; for groups)
| Code: | | a1,a2,a3,a4;b1,b2,b3,b4 |
This addon keeps the current codes working as it builds as a layer on top of the curent one.
Consider the option. _________________ Carlos Troncoso |
|
| Back to top |
|
 |
Titan
Joined: 11 Aug 2004 Posts: 5107 Location: eth0
|
Posted: Tue Feb 12, 2008 10:33 pm Post subject: |
|
|
| schmick wrote: | | /root/Element[ID=463336][ID=477992]/Text/text() | If you tried my example with the XML document in your previous post you can see that it worked. Perhaps there was an error somewhere else, if you notice it again could you post the XML?
| schmick wrote: | | I guess the syntax output is not correct. | I can't tell because I don't know how you XML file looked like at the time
It seems to be correct though, assuming 'Cirugia' was the text node of the element whose ID node was '463336'.
| schmick wrote: | | I experimented with rawsrc(), would be nice to make it work with my code, but I noticed that the output had no root node. It is still xml compliant? | The root node is changed to the child of the selected node when rawsrc() is used. As you may have realized this is a non-standard xpath function. However PHP and a few other parsers have something similar to this as it allows you narrow down the DOM scope and return the XML source of the selection without mapping every fragment. This provides exponentially quicker read operations of which I'll explain in the docs when it's ready.
| schmick wrote: | | if you use multiple statements separated by |, the output uses a coma to separate both outputs. | This is standard behaviour. In many cases it can be useful if you do not wish to differentiate between multiple node sets.
| schmick wrote: | | You might want to consider a new function, such as xpath_config(options). | Seems like an overkill for the current implementation, especially when you can achieve it by using value := xpath(x, "/node/to/first/element") . ":" xpath(x, "/node/to/second/text()"). I'm still open to suggestions though. _________________
RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2") |
|
| Back to top |
|
 |
schmick
Joined: 08 Jan 2008 Posts: 20 Location: Santiago - Chile -SouthAmerica - Planet Earth - Milky Way - Local Cumulus - Universe V1.0
|
Posted: Tue Feb 12, 2008 10:55 pm Post subject: |
|
|
Hi Titan!
Well this is the XML
| Code: | <?xml version="1.0" encoding="iso-8859-1"?>
<root >
<Element>
<Text>Cirugia</Text>
<Parent>0</Parent>
<ID>463336</ID>
</Element>
<Element>
<Text>CPQ</Text>
<Parent>0</Parent>
<ID>462800</ID>
</Element>
<Element>
<Text>_Documentacion_</Text>
<Parent>462800</Parent>
<ID>477992</ID>
</Element>
<Element>
<Text>Aseo Quirurgico</Text>
<Parent>462800</Parent>
<ID>462904</ID>
</Element>
</root> |
this the code
| Code: | theparents=% xpath(xmlin, "/root/Element[ID=463336][ID=477992]/Text/text()")
msgbox %theparents% |
The msgbox is empty.
but with
| Code: | theparents=% xpath(xmlin, "/root/Element[ID=463336]/Text/text()")
msgbox %theparents% |
The msgbox outputs Cirugia as expected.
Only using one [id=value] works, but [id=value1][id=value2] doesn't.
Regarding the xpath_config... yep.. you are right, there are optional workarounds without need to change the code.
Your turn.  _________________ Carlos Troncoso |
|
| Back to top |
|
 |
Titan
Joined: 11 Aug 2004 Posts: 5107 Location: eth0
|
Posted: Tue Feb 12, 2008 11:06 pm Post subject: |
|
|
| schmick wrote: | | Only using one [id=value] works, but [id=value1][id=value2] doesn't. | Argh what was I thinking earlier! In Xpath predicates are stacked, and every one must be satisfied for a nodeset. Therefore [ID=463336][ID=477992] will never be true because ID cannot possibly equate to two different things at the same time. You'll need to use /root/Element[ID='463336']/Text/text() | /root/Element[ID='477992']/Text/text() instead  _________________
RegExReplace("irc.freenode.net/ahk", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2") |
|
| Back to top |
|
 |
supergrass
Joined: 21 Feb 2007 Posts: 29 Location: Australia
|
Posted: Fri Feb 22, 2008 2:05 pm Post subject: back to school! |
|
|
This is very elementary, and I am a bit embarrassed that I can't work it out.
I have a script I am working on to submit a set search format to a bible verse website. I have tried to use xpath to provide shortcuts to the Book names eg:
XML File:
| Code: | <?xml version="1.0" encoding="utf-8"?>
<book>
<Gen>Genesis%20</Gen>
<Genesis>Genesis%20</Genesis>
<Ex>Exodus%20</Ex>
</book> |
AHK:
| Code: | xpath_load(BibleBooks,"bibles.xml")
#1::
InputBox, Book, Book, Enter the Bible Book
Book := xpath(BibleBooks, "/Book/".Book."/text()")
InputBox, Verses, Verses, Enter the verses
InputBox, Version, Version, Enter the version
url = http://www.biblegateway.com/passage/?search=%Book%%Verses%;&version=%Version%;
Run, iexplore.exe %url%
return |
My problems are twofold:
1) For some reason I get nothing returned even if I hard code a Book eg:
| Code: | Book := xpath(BibleBooks, "/Book/Gen/text()")
Msgbox %Book% |
2) I get an error with .Book. part "The variable name contains an illegal character ".Book."" |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|