It is pretty common to come across a webpage that using the same class name for many elements. Usually we could rely on the index number and it will be ok, something like:
pwb.document.getElementsByClassName("control-imgx")[5].innertext
The problem is that sometimes sites change often and what is index 5 today is index 7 tomorrow.
Other than using urldownloadtovar and doing regex (my last resort for this specific project because I must use IE com anyway) is there some trick on how to make sure to get the correct value? The value always changes too, so I cannot look for it either.
Any tips on this?
Web scraping using IE - problem when the index number of a element changes Topic is solved
-
- Posts: 1472
- Joined: 05 May 2018, 12:23
Re: Web scraping using IE - problem when the index number of a element changes
What is the reason you need the correct index value? are you trying to grab only one DOM elements text from a webpage? are you looking for a kind of DOM element and collecting it's text for every instance? can you give a URL to the webpage you're scraping or a similar URL along with the desired contents to be scraped? do you have any code that works or does not work and may we see it?
-
- Posts: 1472
- Joined: 05 May 2018, 12:23
Re: Web scraping using IE - problem when the index number of a element changes
I put together a very short example so you can quickly test. In this example I want to get the likes on a youtube video (without using YT API or urldownloadtovar and regex). I am wondering, if [21] changes to [22] is there a better way using ie com to get the likes? Thank youRangerbot wrote:What is the reason you need the correct index value? are you trying to grab only one DOM elements text from a webpage? are you looking for a kind of DOM element and collecting it's text for every instance? can you give a URL to the webpage you're scraping or a similar URL along with the desired contents to be scraped? do you have any code that works or does not work and may we see it?
Code: Select all
wb := ComObjCreate("InternetExplorer.Application")
wb.Visible := False
wb.Navigate("https://www.youtube.com/watch?v=CYpcK0tDU58")
while wb.ReadyState != 4
sleep, 500
likes := wb.document.getElementsByClassName("yt-uix-button-content")[21].innertext
msgbox % likes
wb.quit()
ExitApp
- Blackholyman
- Posts: 1293
- Joined: 29 Sep 2013, 22:57
- Location: Denmark
- Contact:
Re: Web scraping using IE - problem when the index number of a element changes
one way is to find an element that has something unique like an id or some class text unique to that element and work out from that...
Example:
Example:
Code: Select all
wb := ComObjCreate("InternetExplorer.Application")
wb.Visible := False
wb.Navigate("https://www.youtube.com/watch?v=CYpcK0tDU58")
while wb.ReadyState != 4
sleep, 500
likeButtonIconElement := wb.document.getElementsByClassName("like-button-renderer-like-button")[0]
likes := likeButtonIconElement.parentNode.getElementsByClassName("yt-uix-button-content")[0].innertext
msgbox % likes
wb.quit()
ExitApp
Courses on AutoHotkey
My Autohotkey Blog
-
- Posts: 1472
- Joined: 05 May 2018, 12:23
Re: Web scraping using IE - problem when the index number of a element changes
Very cool info.Blackholyman wrote:one way is to find an element that has something unique like an id or some class text unique to that element and work out from that...
Example:Code: Select all
wb := ComObjCreate("InternetExplorer.Application") wb.Visible := False wb.Navigate("https://www.youtube.com/watch?v=CYpcK0tDU58") while wb.ReadyState != 4 sleep, 500 likeButtonIconElement := wb.document.getElementsByClassName("like-button-renderer-like-button")[0] likes := likeButtonIconElement.parentNode.getElementsByClassName("yt-uix-button-content")[0].innertext msgbox % likes wb.quit() ExitApp
My question is, I find this data "like-button-renderer-like-button" and also "like-button-renderer-dislike-button" however, its not next to a span class its by button title =
Am I missing something ?
- Blackholyman
- Posts: 1293
- Joined: 29 Sep 2013, 22:57
- Location: Denmark
- Contact:
Re: Web scraping using IE - problem when the index number of a element changes Topic is solved
if you look at my code example i find the element (a button) with the className "like-button-renderer-like-button", and store a ref to it in the variable likeButtonIconElement then in the next line i go out from that element and use parentNode to get to its parent element in this case a span with a commen className ("yt-uix-clickcard") then use getElementsByClassName to get any elements with the className ("yt-uix-button-content") under that in this case there is only one aka [0] and then store the innertext of that element in the variable likes
so find a unique ref element and work up/down/out from that...
so find a unique ref element and work up/down/out from that...
Courses on AutoHotkey
My Autohotkey Blog
-
- Posts: 1472
- Joined: 05 May 2018, 12:23
Re: Web scraping using IE - problem when the index number of a element changes
Thank you Blackholyman
Who is online
Users browsing this forum: balawi28, Chunjee, Google [Bot], moltenchees and 261 guests