Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

xpath v3 - read and write XML documents with XPath syntax


  • This topic is locked This topic is locked
14 replies to this topic
polyethene
  • Members
  • 5519 posts
  • Last active: May 17 2015 06:39 AM
  • Joined: 26 Oct 2012
A simple and easy set of functions for parsing XML content with xpath including save and load routines. Extremely fast and lightweight for AutoHotkey; nodes and attributes can be created and removed directly within your expressions without DOM traversal.

The included manual covers some of the common uses of xpath and demonstrates how to use this library. Unlike my other scripts this one is commented from the source for anyone who wants to know how it works.

Download

I'd like to thank everyone who posted suggestions, bug reports and advice.

autohotkey.com/net Site Manager

 

Contact me by email (polyethene at autohotkey.net) or message tidbit


n-l-i-d
  • Guests
  • Last active:
  • Joined: --
Wow! :) I'm baffled. Great work!

Ace_NoOne
  • Members
  • 299 posts
  • Last active: May 02 2008 08:19 AM
  • Joined: 10 Oct 2005
That's just awesome, Titan!
Seriously impressive!
Improving my world, one script at a time.
Join the AutoHotkey IRC channel: irc.freenode.net #autohotkey

Tuncay
  • Members
  • 1945 posts
  • Last active: Feb 08 2015 03:49 PM
  • Joined: 07 Nov 2006
Wow thx I was waiting for this since I know you was working on it. 8)

No signature.


majkinetor
  • Moderators
  • 4512 posts
  • Last active: May 20 2019 07:41 AM
  • Joined: 24 May 2006
/applaud
Posted Image

Ace_NoOne
  • Members
  • 299 posts
  • Last active: May 02 2008 08:19 AM
  • Joined: 10 Oct 2005
Hmm ... I can't seem to get this working - either I'm stupid, or it's a bug:

I'm trying to retrieve the latest link from the xkcd feed (see below for a snapshot of its current state).
For that I use the following code:
xkcdFeed = http://xkcd.com/rss.xml
rss := XmlDoc(xkcdFeed)
latest := XPath(rss, "/rss/channel/item[1]/link")
(not sure whether the nodes array is zero-based or one-based, but that doesn't matter here)

However, this returns the full ITEM node of the last (third) item:
<item>

 <title>Keyboards are Disgusting
  </title>

 <link>http://xkcd.com/c237.html
  </link>

 <description><img src="http://imgs.xkcd.com/comics/keyboards_are_disgusting.png" title="Alternate method: convince them to pretend it's an Etch-a-Sketch and try to erase it." alt="Alternate method: convince them to pretend it's an Etch-a-Sketch and try to erase it." />
 </description>

<guid isPermaLink="true">http://xkcd.com/c237.html
 </guid>

<pubDate>2007-03-19
 </pubDate>

</item>
I can't explain this - can anyone else?



Current contents of the xkcd feed (for reference):
<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule">

<channel>
<title>xkcd.com</title>
<link>http://www.xkcd.com</link>
<description>xkcd.com: A webcomic of romance and math humor.</description>
<language>en</language>
<copyright>Copyright 2005-2006 Randall Munroe</copyright>
<pubDate>Fri, 23 Mar 2007 07:47:44 -0400</pubDate>
<lastBuildDate>Fri, 23 Mar 2007 07:47:44 -0400</lastBuildDate>
<managingEditor>[email protected]</managingEditor>
<webMaster>[email protected]</webMaster>

<item>
<title>Blagofaire</title>
<link>http://xkcd.com/c239.html</link>
<description><img src="http://imgs.xkcd.com/comics/blagofaire.png" title="Things were better before the Structuring and the Levels." alt="Things were better before the Structuring and the Levels." /></description>
<guid isPermaLink="true">http://xkcd.com/c239.html</guid>
<pubDate>2007-03-23</pubDate>
</item>

<item>
<title>Pet Peeve #114</title>
<link>http://xkcd.com/c238.html</link>
<description><img src="http://imgs.xkcd.com/comics/pet_peeve_114.png" title="I'm reading a book, thank you very much." alt="I'm reading a book, thank you very much." /></description>
<guid isPermaLink="true">http://xkcd.com/c238.html</guid>
<pubDate>2007-03-21</pubDate>
</item>

<item>
<title>Keyboards are Disgusting</title>
<link>http://xkcd.com/c237.html</link>
<description><img src="http://imgs.xkcd.com/comics/keyboards_are_disgusting.png" title="Alternate method: convince them to pretend it's an Etch-a-Sketch and try to erase it." alt="Alternate method: convince them to pretend it's an Etch-a-Sketch and try to erase it." /></description>
<guid isPermaLink="true">http://xkcd.com/c237.html</guid>
<pubDate>2007-03-19</pubDate>
</item>

</channel>
</rss>

Improving my world, one script at a time.
Join the AutoHotkey IRC channel: irc.freenode.net #autohotkey

polyethene
  • Members
  • 5519 posts
  • Last active: May 17 2015 06:39 AM
  • Joined: 26 Oct 2012
The following gives me the correct result:

xkcdFeed = http://xkcd.com/rss.xml
rss := XmlDoc(xkcdFeed)
latest := XPath(rss, "/rss/channel/item[1]/link[color=violet]/text()[/color]")
MsgBox, %latest%
Try re-downloading XPath.ahk if it doesn't work for you. Note that the function preserves whitespace regardless of PI, so you can use the ^\s*|\s*$ regex to convert the string into a valid URI.

autohotkey.com/net Site Manager

 

Contact me by email (polyethene at autohotkey.net) or message tidbit


Ace_NoOne
  • Members
  • 299 posts
  • Last active: May 02 2008 08:19 AM
  • Joined: 10 Oct 2005
Thanks Titan; I found the problem:

I had opened XPath.ahk in the browser (Firefox) and just copied the code into my file. For whatever reason, that seems to create problems; it works just fine if I actually download the file and take the code from there.
Improving my world, one script at a time.
Join the AutoHotkey IRC channel: irc.freenode.net #autohotkey

Guest+
  • Guests
  • Last active:
  • Joined: --
Correct the link on this page - https://ahknet.autoh...itan/xpath.html

polyethene
  • Members
  • 5519 posts
  • Last active: May 17 2015 06:39 AM
  • Joined: 26 Oct 2012
Thanks, sorry about that.

autohotkey.com/net Site Manager

 

Contact me by email (polyethene at autohotkey.net) or message tidbit


Guest+
  • Guests
  • Last active:
  • Joined: --
No problem 8)

btw, very nice script. It will help me a lot.

chris_lee
  • Members
  • 6 posts
  • Last active: Apr 07 2008 01:05 AM
  • Joined: 04 Apr 2007
I tried the code
xkcdFeed = http://xkcd.com/rss.xml
rss := XmlDoc(xkcdFeed)
latest := XPath(rss, "/rss/channel/item[1]/link/text()")
MsgBox, %latest%
but I got some extra tailing bytes of 0xA0.
any idea?

polyethene
  • Members
  • 5519 posts
  • Last active: May 17 2015 06:39 AM
  • Joined: 26 Oct 2012
<!-- m -->http://www.xkcd.com/rss.xml<!-- m --> is not very well-formed, it has unescaped closing tags within /rss/channel/description. I'll improve the parser to ignore such closing tags without opening ones and strip off generated whitespace ( /0xA0) for the next version. In the meantime you could use:

xkcdFeed = http://www.xkcd.com/rss.xml ; URI
xkcdLocFeed = %A_Temp%\xkcd.xml ; local path
UrlDownloadToFile, %xkcdFeed%, %xkcdLocFeed%
FileRead, rss, %xkcdLocFeed%
StringReplace, rss, rss, /></description>, /></description>, All ; escape closing tags
rss := XmlDoc(rss) ; load as var ...
latest := XPath(rss, "/rss/channel/item[1]/link/text()")
MsgBox, %latest%

Thanks for the feedback.

Edit: now fixed in version 1.01 thanks.

autohotkey.com/net Site Manager

 

Contact me by email (polyethene at autohotkey.net) or message tidbit


chris_lee
  • Members
  • 6 posts
  • Last active: Apr 07 2008 01:05 AM
  • Joined: 04 Apr 2007
I tried version 1.01 and it works just fine.
Thanks a lot! :lol:

Venia Legendi
  • Members
  • 35 posts
  • Last active: Apr 04 2011 08:36 PM
  • Joined: 27 May 2005
Hallo, 1st of all - THIS IS JUST WHAT I NEEDED, thanks.

Nevertheless I don't understand the follwing: Why ist Book2 with a price of 2.0 found by [price>2.0] and it's not found by [price=2.0]?
#Include XPath.ahk
xPath(t, "/bookstore[+1]")
loop, 5 {
    xPath(t, "/bookstore/book[+1]/title[+1]", "title" A_index)
    xPath(t, "/bookstore/book[" A_index "]/price[+1]", A_index ".0")
}
E := "[price>=2.0] " XPath(t, "/bookstore/book[price>=2.0]/title/text()")
E := E "`n[price>2.0] " XPath(t, "/bookstore/book[price>2.0]/title/text()") " -> also Book2 found?"
E := E "`n[price=2.0] " XPath(t, "/bookstore/book[price=2.0]/title/text()") " -> not found?"
E := E "`n[price=2] " XPath(t, "/bookstore/book[price=2]/title/text()") " -> not found?"
E := E "`n---" t
Gui, Add, Edit, w400 h400 ReadOnly -Wrap, %E%
Gui, Show, , xPath