How to extract url from a webpage?

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

How to extract url from a webpage?

Post by Muhammadi » 29 Jun 2021, 15:57

How to extract url from a webpage?

I got a list of webpages in Excel.
I need to extract url's from these webpages.
Right now I'm finding them by opening "Inspect" - search (Ctrl+F) - "websiteurl"

For example, extract the following url:
https://nationallaserinstitute.com/pdo-threads/
from the following webpage:
https://www.emedevents.com/c/medical-conferences-2021/2-day-pdo-thread-lift-course-dallas-jul-03-04-2021
Last edited by Muhammadi on 30 Jun 2021, 05:41, edited 3 times in total.

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 30 Jun 2021, 05:37

I downloaded G33kDUde's Chrome library from the following links:
https://github.com/G33kDude/Chrome.ahk
https://github.com/G33kDude/Chrome.ahk/releases/tag/1.2

I wrote the following a sample script shown on the following video just to make sure that the library works:
https://www.youtube.com/watch?v=btn1pSRE8es

Code: Select all

#SingleInstance, force
#Include C:\1Macro Recorder\1PMC\Chrome\Chrome.ahk
If (!FileExist("profile"))
{
    FileCreateDir, % "profile"
}
chrome := new chrome("profile", "https://www.the-automator.com/autohotkey/chrome-with-autohotkey/")
pg := chrome.getpage()
pg.WaitForLoad("complete")
pg.Evaluate("document.querySelectorAll('#post-4101 > div > p:nth-child(4) > a:nth-child(4)')[0].click()")
This script doesn't do anything.
What did I do wrong?

teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: How to extract url from a webpage?

Post by teadrinker » 30 Jun 2021, 07:17

Muhammadi wrote: chrome := new chrome
You cannot give a variable the same name as the class name.

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 30 Jun 2021, 08:02

teadrinker wrote:
30 Jun 2021, 07:17
You cannot give a variable the same name as the class name.
The person in the video wrote exactly the same string and it worked for him.

I copied his code but in my case nothing happens)

Am I supposed to do something with the library (Chrome.ahk)?

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 30 Jun 2021, 09:22

Can someone write a script that extracts url from any website?

Hopefully I'll be able to adjust your script to my needs.

teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: How to extract url from a webpage?

Post by teadrinker » 30 Jun 2021, 09:58

Muhammadi wrote: The person in the video wrote exactly the same string and it worked for him.
Perhaps, that person was not very advansed in AHK, just don't do like this, use another variable name.
Muhammadi wrote: I copied his code but in my case nothing happens)
Chrome browser won't open?
Muhammadi wrote: Can someone write a script that extracts url from any website?
Your question does not make sense. A website can contain a lot of urls. Which one you need?

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 30 Jun 2021, 10:30

teadrinker wrote:
30 Jun 2021, 09:58
A website can contain a lot of urls. Which one you need?
I got a list of webpages of the same website.
I need to extract urls from these pages.
Urls can be found by opening "Inspect" - Search - "websiteurl"
All urls are in the same place.

Examples:
1. Webpage:
https://www.emedevents.com/c/medical-conferences-2021/2-day-pdo-thread-lift-course-dallas-jul-03-04-2021
Url that needs to be extracted from this webpage:
https://nationallaserinstitute.com/pdo-threads/

2. Webpage:
https://www.emedevents.com/online-cme-courses/live-webinar/manchester-pathology-2021
Url that needs to be extracted from this webpage:
http://www.path.org.uk/

teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: How to extract url from a webpage?

Post by teadrinker » 30 Jun 2021, 11:03

To get info from most sites you don't need to open them in a webbrowser, the winhtttp request and RegEx are enough:

Code: Select all

for k, url in [ "https://www.emedevents.com/c/medical-conferences-2021/2-day-pdo-thread-lift-course-dallas-jul-03-04-2021"
              , "https://www.emedevents.com/online-cme-courses/live-webinar/manchester-pathology-2021" ]
{
   html := GetHtml(url)
   RegExMatch(html, "id=""websiteurlfecth""[^>]+?value=""\Khttp[^""]+", url%k%)
}
MsgBox, % url1 "`n" url2
   
GetHtml(url) {
   Whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
   Whr.Open("GET", url, true)
   Whr.Send()
   Whr.WaitForResponse()
   status := Whr.status
   if (status != 200)
      throw "HttpRequest error, status: " . status
   Arr := Whr.responseBody
   pData := NumGet(ComObjValue(arr) + 8 + A_PtrSize)
   length := arr.MaxIndex() + 1
   Return html := StrGet(pData, length, "UTF-8")
}

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 08 Jul 2021, 11:06

Your code gives the following error message:
Image
Last edited by Muhammadi on 15 Jul 2021, 06:52, edited 3 times in total.

teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: How to extract url from a webpage?

Post by teadrinker » 08 Jul 2021, 11:29

Can't reproduce.
If you want to display an image, use the direct link to it: https://i.imgur.com/Uo9M5yD.jpeg
Last edited by teadrinker on 08 Jul 2021, 11:42, edited 1 time in total.

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 08 Jul 2021, 11:33

@teadrinker What's the reason of the error?
Have I done something wrong?


User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 08 Jul 2021, 16:45

teadrinker wrote:
08 Jul 2021, 11:45
See this topic.
Thank you so much, now it works.

Can you change the script to perform the following?
Copy an active cell with a webpage link in Excel file, paste extracted url to another cell in Excel file.

teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: How to extract url from a webpage?

Post by teadrinker » 09 Jul 2021, 09:17

There are a lot of code samples demonstrating how to work with Excel on this forum. Try to find them. For me it is not so interesting. :)

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 10 Jul 2021, 15:04

@teadrinker No, there's no need to write a complex script with Excel. Just a little change to your script.
I've tried to do it myself but so far to no avail.

There's one webpage link in clipboard (I've already copied it) and I need one extracted url in the message box.
Can you tell me how to put the copied webpage link to your script?

Something like this:

Code: Select all

for k, url in clipboard ; One webpage link is in clipboard
{
   html := GetHtml(url)
   RegExMatch(html, "id=""websiteurlfecth""[^>]+?value=""\Khttp[^""]+", url%k%)
}
MsgBox, % url1 ; I need one extracted url from one webpage link.
   
GetHtml(url) {
   Whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
   Whr.Open("GET", url, true)
   Whr.Send()
   Whr.WaitForResponse()
   status := Whr.status
   if (status != 200)
      throw "HttpRequest error, status: " . status
   Arr := Whr.responseBody
   pData := NumGet(ComObjValue(arr) + 8 + A_PtrSize)
   length := arr.MaxIndex() + 1
   Return html := StrGet(pData, length, "UTF-8")
}

teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: How to extract url from a webpage?

Post by teadrinker » 11 Jul 2021, 05:18

Muhammadi wrote: There's one webpage link in clipboard
I you have the link in the clipboard, you just need to pass the clipboard to the function: GetHtml(Clipboard)

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 15 Jul 2021, 06:55

teadrinker wrote:
11 Jul 2021, 05:18
I you have the link in the clipboard, you just need to pass the clipboard to the function: GetHtml(Clipboard)
I did it. Thank you, it works.
Right now, my code looks like this:

Code: Select all

F3::
Loop
{
Winactivate, List of Event URLs - 21.07 - Excel ahk_class XLMAIN
Xl := ComObjActive("Excel.Application")
Clipboard := Xl.ActiveCell.Value
for k, url in [clipboard]
{
   html := GetHtml(clipboard)
   RegExMatch(html, "id=""websiteurlfecth""[^>]+?value=""\Khttp[^""]+", url%k%)
}
clipboard := url1
Send, {Right}
Xl.Selection.Value := Clipboard
Send, {Down}
Sleep, 111
Send, {Left}

GetHtml(url) {
   Whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
   Whr.Open("GET", url, true)
   Whr.Send()
   Whr.WaitForResponse()
   status := Whr.status
   if (status != 200)
      throw "HttpRequest error, status: " . status
   Arr := Whr.responseBody
   pData := NumGet(ComObjValue(arr) + 8 + A_PtrSize)
   length := arr.MaxIndex() + 1
   Return html := StrGet(pData, length, "UTF-8")
}
}
Return
Sometimes it gives different types of errors, like this one:
Image

If there's any type of error, I want it to automatically skip the webpage with error and go to the next webpage.
Is there a way to do it?

teadrinker
Posts: 4309
Joined: 29 Mar 2015, 09:41
Contact:

Re: How to extract url from a webpage?

Post by teadrinker » 15 Jul 2021, 07:41

What exactly does the clipboard contain?

User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 15 Jul 2021, 08:09

First the clipboard contains a webpage, then it contains the extracted url.

The code copies a webpage in A2 and pastes the extracted url to B2.
Then it copies a webpage in A3 and pastes the extracted url to B3, and so on.

I got a list of webpages in column A.
The task is to paste extracted urls to column B.

Please find attached the excel file with a list of webpages in column A and extracted urls in column B.
Attachments

[The extension xlsx has been deactivated and can no longer be displayed.]


User avatar
Muhammadi
Posts: 36
Joined: 27 May 2021, 20:00

Re: How to extract url from a webpage?

Post by Muhammadi » 15 Jul 2021, 11:38

teadrinker wrote:
15 Jul 2021, 07:41
What exactly does the clipboard contain?
In this case it contains the following url:
https://www.emedevents.com/Conferences/searchConference#

Sometimes it gives other types of errors even the perfectly legit webpages.

Post Reply

Return to “Ask for Help (v1)”