How to extract url from a webpage?
How to extract url from a webpage?
How to extract url from a webpage?
I got a list of webpages in Excel.
I need to extract url's from these webpages.
Right now I'm finding them by opening "Inspect" - search (Ctrl+F) - "websiteurl"
For example, extract the following url:
https://nationallaserinstitute.com/pdo-threads/
from the following webpage:
https://www.emedevents.com/c/medical-conferences-2021/2-day-pdo-thread-lift-course-dallas-jul-03-04-2021
I got a list of webpages in Excel.
I need to extract url's from these webpages.
Right now I'm finding them by opening "Inspect" - search (Ctrl+F) - "websiteurl"
For example, extract the following url:
https://nationallaserinstitute.com/pdo-threads/
from the following webpage:
https://www.emedevents.com/c/medical-conferences-2021/2-day-pdo-thread-lift-course-dallas-jul-03-04-2021
Last edited by Muhammadi on 30 Jun 2021, 05:41, edited 3 times in total.
Re: How to extract url from a webpage?
I downloaded G33kDUde's Chrome library from the following links:
https://github.com/G33kDude/Chrome.ahk
https://github.com/G33kDude/Chrome.ahk/releases/tag/1.2
I wrote the following a sample script shown on the following video just to make sure that the library works:
https://www.youtube.com/watch?v=btn1pSRE8es
This script doesn't do anything.
What did I do wrong?
https://github.com/G33kDude/Chrome.ahk
https://github.com/G33kDude/Chrome.ahk/releases/tag/1.2
I wrote the following a sample script shown on the following video just to make sure that the library works:
https://www.youtube.com/watch?v=btn1pSRE8es
Code: Select all
#SingleInstance, force
#Include C:\1Macro Recorder\1PMC\Chrome\Chrome.ahk
If (!FileExist("profile"))
{
FileCreateDir, % "profile"
}
chrome := new chrome("profile", "https://www.the-automator.com/autohotkey/chrome-with-autohotkey/")
pg := chrome.getpage()
pg.WaitForLoad("complete")
pg.Evaluate("document.querySelectorAll('#post-4101 > div > p:nth-child(4) > a:nth-child(4)')[0].click()")
What did I do wrong?
-
- Posts: 4309
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
The person in the video wrote exactly the same string and it worked for him.
I copied his code but in my case nothing happens)
Am I supposed to do something with the library (Chrome.ahk)?
Re: How to extract url from a webpage?
Can someone write a script that extracts url from any website?
Hopefully I'll be able to adjust your script to my needs.
Hopefully I'll be able to adjust your script to my needs.
-
- Posts: 4309
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
Perhaps, that person was not very advansed in AHK, just don't do like this, use another variable name.
Chrome browser won't open?
Your question does not make sense. A website can contain a lot of urls. Which one you need?
Re: How to extract url from a webpage?
I got a list of webpages of the same website.
I need to extract urls from these pages.
Urls can be found by opening "Inspect" - Search - "websiteurl"
All urls are in the same place.
Examples:
1. Webpage:
https://www.emedevents.com/c/medical-conferences-2021/2-day-pdo-thread-lift-course-dallas-jul-03-04-2021
Url that needs to be extracted from this webpage:
https://nationallaserinstitute.com/pdo-threads/
2. Webpage:
https://www.emedevents.com/online-cme-courses/live-webinar/manchester-pathology-2021
Url that needs to be extracted from this webpage:
http://www.path.org.uk/
-
- Posts: 4309
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
To get info from most sites you don't need to open them in a webbrowser, the winhtttp request and RegEx are enough:
Code: Select all
for k, url in [ "https://www.emedevents.com/c/medical-conferences-2021/2-day-pdo-thread-lift-course-dallas-jul-03-04-2021"
, "https://www.emedevents.com/online-cme-courses/live-webinar/manchester-pathology-2021" ]
{
html := GetHtml(url)
RegExMatch(html, "id=""websiteurlfecth""[^>]+?value=""\Khttp[^""]+", url%k%)
}
MsgBox, % url1 "`n" url2
GetHtml(url) {
Whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
Whr.Open("GET", url, true)
Whr.Send()
Whr.WaitForResponse()
status := Whr.status
if (status != 200)
throw "HttpRequest error, status: " . status
Arr := Whr.responseBody
pData := NumGet(ComObjValue(arr) + 8 + A_PtrSize)
length := arr.MaxIndex() + 1
Return html := StrGet(pData, length, "UTF-8")
}
Re: How to extract url from a webpage?
Your code gives the following error message:
Last edited by Muhammadi on 15 Jul 2021, 06:52, edited 3 times in total.
-
- Posts: 4309
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
Can't reproduce.
If you want to display an image, use the direct link to it: https://i.imgur.com/Uo9M5yD.jpeg
If you want to display an image, use the direct link to it: https://i.imgur.com/Uo9M5yD.jpeg
Last edited by teadrinker on 08 Jul 2021, 11:42, edited 1 time in total.
Re: How to extract url from a webpage?
@teadrinker What's the reason of the error?
Have I done something wrong?
Have I done something wrong?
-
- Posts: 4309
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
See this topic.
Re: How to extract url from a webpage?
Thank you so much, now it works.
Can you change the script to perform the following?
Copy an active cell with a webpage link in Excel file, paste extracted url to another cell in Excel file.
-
- Posts: 4309
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
There are a lot of code samples demonstrating how to work with Excel on this forum. Try to find them. For me it is not so interesting.
Re: How to extract url from a webpage?
@teadrinker No, there's no need to write a complex script with Excel. Just a little change to your script.
I've tried to do it myself but so far to no avail.
There's one webpage link in clipboard (I've already copied it) and I need one extracted url in the message box.
Can you tell me how to put the copied webpage link to your script?
Something like this:
I've tried to do it myself but so far to no avail.
There's one webpage link in clipboard (I've already copied it) and I need one extracted url in the message box.
Can you tell me how to put the copied webpage link to your script?
Something like this:
Code: Select all
for k, url in clipboard ; One webpage link is in clipboard
{
html := GetHtml(url)
RegExMatch(html, "id=""websiteurlfecth""[^>]+?value=""\Khttp[^""]+", url%k%)
}
MsgBox, % url1 ; I need one extracted url from one webpage link.
GetHtml(url) {
Whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
Whr.Open("GET", url, true)
Whr.Send()
Whr.WaitForResponse()
status := Whr.status
if (status != 200)
throw "HttpRequest error, status: " . status
Arr := Whr.responseBody
pData := NumGet(ComObjValue(arr) + 8 + A_PtrSize)
length := arr.MaxIndex() + 1
Return html := StrGet(pData, length, "UTF-8")
}
-
- Posts: 4309
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
I did it. Thank you, it works.teadrinker wrote: ↑11 Jul 2021, 05:18I you have the link in the clipboard, you just need to pass the clipboard to the function: GetHtml(Clipboard)
Right now, my code looks like this:
Code: Select all
F3::
Loop
{
Winactivate, List of Event URLs - 21.07 - Excel ahk_class XLMAIN
Xl := ComObjActive("Excel.Application")
Clipboard := Xl.ActiveCell.Value
for k, url in [clipboard]
{
html := GetHtml(clipboard)
RegExMatch(html, "id=""websiteurlfecth""[^>]+?value=""\Khttp[^""]+", url%k%)
}
clipboard := url1
Send, {Right}
Xl.Selection.Value := Clipboard
Send, {Down}
Sleep, 111
Send, {Left}
GetHtml(url) {
Whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
Whr.Open("GET", url, true)
Whr.Send()
Whr.WaitForResponse()
status := Whr.status
if (status != 200)
throw "HttpRequest error, status: " . status
Arr := Whr.responseBody
pData := NumGet(ComObjValue(arr) + 8 + A_PtrSize)
length := arr.MaxIndex() + 1
Return html := StrGet(pData, length, "UTF-8")
}
}
Return
If there's any type of error, I want it to automatically skip the webpage with error and go to the next webpage.
Is there a way to do it?
-
- Posts: 4309
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: How to extract url from a webpage?
What exactly does the clipboard contain?
Re: How to extract url from a webpage?
First the clipboard contains a webpage, then it contains the extracted url.
The code copies a webpage in A2 and pastes the extracted url to B2.
Then it copies a webpage in A3 and pastes the extracted url to B3, and so on.
I got a list of webpages in column A.
The task is to paste extracted urls to column B.
Please find attached the excel file with a list of webpages in column A and extracted urls in column B.
The code copies a webpage in A2 and pastes the extracted url to B2.
Then it copies a webpage in A3 and pastes the extracted url to B3, and so on.
I got a list of webpages in column A.
The task is to paste extracted urls to column B.
Please find attached the excel file with a list of webpages in column A and extracted urls in column B.
- Attachments
-
[The extension xlsx has been deactivated and can no longer be displayed.]
Re: How to extract url from a webpage?
In this case it contains the following url:
https://www.emedevents.com/Conferences/searchConference#
Sometimes it gives other types of errors even the perfectly legit webpages.