How to extract url from a webpage?
Re: How to extract url from a webpage?
Any Google play app detail page it direct matches the developer URL.
How is it not right?
How is it not right?
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
Why not link to your example on regex 101?
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
AHK doesn't use ECMAScript variant of RegEx, it uses PCRE.
Re: How to extract url from a webpage?
Then why does the regex work when I load the webpage and use control U, copy the source into clipboard?
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
The code may differ in this case.
Re: How to extract url from a webpage?
Well, an alternative PCRE version would be appreciated,
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
Before you do it, I can't help.
Re: How to extract url from a webpage?
Anything wrong with this?
<a href=["]\Khttps?([^""]+)(?="\sclass="hrTbp ">Visit website)
<a href=["]\Khttps?([^""]+)(?="\sclass="hrTbp ">Visit website)
Re: How to extract url from a webpage?
(?<=<a href=")https?([^""]+)(?="\sclass="hrTbp ">Visit website)
(?<=<a href="")https?([^""]+)(?=""\sclass=""hrTbp "">Visit website)
Hah, I got it working finally!!!!
Worth it though, way faster.
(?<=<a href="")https?([^""]+)(?=""\sclass=""hrTbp "">Visit website)
Hah, I got it working finally!!!!
Worth it though, way faster.
Re: How to extract url from a webpage?
One question, does this work in the background, as sendinput would?
It's weird, If I change window focus while the script is running, and then return it's not updated, but then suddenly it does seem to fill / update.
Same thing happens with screen scrolling. The script seems to continue as it should, then if you scroll down while initially blank, suddenly seems to update.
It's weird, If I change window focus while the script is running, and then return it's not updated, but then suddenly it does seem to fill / update.
Same thing happens with screen scrolling. The script seems to continue as it should, then if you scroll down while initially blank, suddenly seems to update.
Re: How to extract url from a webpage?
Hi again,
Question for @teadrinker.
Would html.getElementsByTagName("H2") work, I'd look to place it in a cell also. I've tried adding it in various ways without much success.
What would be the best way to add it considering the script in this thread.
Question for @teadrinker.
Would html.getElementsByTagName("H2") work, I'd look to place it in a cell also. I've tried adding it in various ways without much success.
What would be the best way to add it considering the script in this thread.
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
The html code can contain several h2 tags. This example shows how you can get all of them:
Code: Select all
headers := {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36"
. " (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36 Edg/89.0.774.63"}
html := GetHtml("https://play.google.com/store/apps/details?id=com.mydiabetes", headers)
Doc := ComObjCreate("HTMLFILE")
Doc.write("<meta http-equiv=""X-UA-Compatible"" content=""IE=edge"">")
Doc.write(html)
coll := Doc.getElementsByTagName("h2")
Loop % coll.length {
MsgBox, % "outer html: " . coll[A_Index - 1].outerHTML . "`n"
. "inner text: " . coll[A_Index - 1].innerText
}
GetHtml(url, HeadersArray := "") {
Whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
Whr.Open("GET", url, true)
for name, value in HeadersArray
Whr.SetRequestHeader(name, value)
Whr.Send()
Whr.WaitForResponse()
status := Whr.status
if (status != 200)
throw "HttpRequest error, status: " . status
Arr := Whr.responseBody
pData := NumGet(ComObjValue(arr) + 8 + A_PtrSize)
length := arr.MaxIndex() + 1
Return html := StrGet(pData, length, "UTF-8")
}
Re: How to extract url from a webpage?
I get this as a response:-
Was this the intent? I'd like to just get the title and not the class info. In fact, what Id like to do is get the inner HTML, H2s into a single variable to then put into a single cell, (double space-separated).
So how can I get more than just one H2
MH2 := MetH2[A_Index - 1].innerText
Was this the intent? I'd like to just get the title and not the class info. In fact, what Id like to do is get the inner HTML, H2s into a single variable to then put into a single cell, (double space-separated).
So how can I get more than just one H2
MH2 := MetH2[A_Index - 1].innerText
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
For me too, 6 messages,
That image above was just the first. My question is how to output it into a variable rather than a msgbox.
That image above was just the first. My question is how to output it into a variable rather than a msgbox.
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
Code: Select all
headers := {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36"
. " (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36 Edg/89.0.774.63"}
html := GetHtml("https://play.google.com/store/apps/details?id=com.mydiabetes", headers)
Doc := ComObjCreate("HTMLFILE")
Doc.write("<meta http-equiv=""X-UA-Compatible"" content=""IE=edge"">")
Doc.write(html)
coll := Doc.getElementsByTagName("h2")
text := ""
Loop % coll.length {
caption := coll[A_Index - 1].innerText
if (caption != "")
text .= (text = "" ? "" : " ") . caption
}
MsgBox, % text
GetHtml(url, HeadersArray := "") {
Whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
Whr.Open("GET", url, true)
for name, value in HeadersArray
Whr.SetRequestHeader(name, value)
Whr.Send()
Whr.WaitForResponse()
status := Whr.status
if (status != 200)
throw "HttpRequest error, status: " . status
Arr := Whr.responseBody
pData := NumGet(ComObjValue(arr) + 8 + A_PtrSize)
length := arr.MaxIndex() + 1
Return html := StrGet(pData, length, "UTF-8")
}
Re: How to extract url from a webpage?
It works,
But man it's really slow!
But man it's really slow!
Re: How to extract url from a webpage?
Hi again
I'm struggling to get the Alt tag.
Your help would be appreciated @teadrinker
I'm struggling to get the Alt tag.
Code: Select all
MetAlt:= Doc.getAttribute("ALT")
text := ""
Loop % MetAlt.length {
caption := MetAlt[A_Index - 1].innerText
if (caption != "")
text .= (text = "" ? "" : " ") . caption
continue
}
XlSht.Cells(row, col + 15).value := metalt
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
There is no such a tag like alt. Also the document object has no alt attribute. Specify exactly what you want to get.