Retrieve string from HTML

Get help with using AutoHotkey and its commands and hotkeys
Vicream
Posts: 6
Joined: 14 Aug 2019, 13:32

Retrieve string from HTML

14 Aug 2019, 13:47

Hello, I'm new to coding but I manage to -almost- fully developed an instagram auto-like ahk. Everything using basic knowledge and assembled parts of scripts from google, so the automation (automatically does everything, login if not, search for the profiles and like all the pictures) are made with the movement of the mouse/basic commands such as Run.

However, I ran into a problem: I successfully splat a textbox edit into pieces that launches tens (depending on the user choice) of instagram account profile pages in chrome, such as "https www.instagram.com /example/".
I don't want the user to specify how many posts each of these profiles have, I want to automatically find it! Thus, I did some research and dug into the inspect element and I succesfully found:
<span class="g47SY ">22,139</span>
This is the code that all of them share, the only difference is this number (22,139). How to put it into a variable?


My script for now: https://github.com/Vicreams/instagramautolikedraft/blob/master/test.ahk
Note: The texts are in french but not the variables
User avatar
Sir Teddy the First
Posts: 94
Joined: 05 Aug 2019, 12:31
Contact:

Re: Retrieve string from HTML

14 Aug 2019, 14:52

Hi,
you have to retrieve the source-code of the website and then use RegExMatch to find that specific part.
Your needle would be something like

Code: Select all

needle = <span class="g47SY ">.*</span>
What RegExMatch does:
It searches for the occurence of this string where the ".*" represents multiple unknown characters.
It will essentially output this whole string you found regardless of the numbers and you then have to extract these numbers via StrReplace and SubStr.

I hope this was helpful!
:eh: :think:
Vicream
Posts: 6
Joined: 14 Aug 2019, 13:32

Re: Retrieve string from HTML

14 Aug 2019, 17:33

Sir Teddy the First wrote:
14 Aug 2019, 14:52
Hi,
you have to retrieve the source-code of the website and then use RegExMatch to find that specific part.
Your needle would be something like

Code: Select all

needle = <span class="g47SY ">.*</span>
What RegExMatch does:
It searches for the occurence of this string where the ".*" represents multiple unknown characters.
It will essentially output this whole string you found regardless of the numbers and you then have to extract these numbers via StrReplace and SubStr.

I hope this was helpful!
Well the main problem is to actually retrieve the source-code, but thanks since you helped future-me!
User avatar
Sir Teddy the First
Posts: 94
Joined: 05 Aug 2019, 12:31
Contact:

Re: Retrieve string from HTML

15 Aug 2019, 02:04

Hi,
there are a number of ways to retrieve the source code:
On Firefox (but it should work in any other browser, just a different shortcut) you could send Ctrl+U to the window, send Ctrl+A, then Ctrl+C and finally Ctrl+W (Open the Source Code, Select it, Copy it, Close the Tab).
This would "fit" to the rest of your program.

Or you use something like this:

Code: Select all

PageRequest := ComObjCreate("WinHttp.WinHttpRequest.5.1")
PageRequest.Open("GET", YourURLThatYouWantTheSourceCodeFrom, true)
PageRequest.Send()
PageRequest.WaitForResponse()
PageText := PageRequest.ResponseText
The Source-Code will then be saved inside "PageText".

Remember that inside "PageRequest.Open" you have to put your URL in quotation marks if its plain text and not a variable referencing this URL.
If you have not yet retrieved the URL, this can be done by sending Ctrl+L, Ctrl+C (and then send something like {Tab} to make sure that the window is not focused on the URL and thus could become unresponsible).
:eh: :think:
Vicream
Posts: 6
Joined: 14 Aug 2019, 13:32

Re: Retrieve string from HTML

15 Aug 2019, 04:27

Sir Teddy the First wrote:
15 Aug 2019, 02:04
Hi,
there are a number of ways to retrieve the source code:
On Firefox (but it should work in any other browser, just a different shortcut) you could send Ctrl+U to the window, send Ctrl+A, then Ctrl+C and finally Ctrl+W (Open the Source Code, Select it, Copy it, Close the Tab).
This would "fit" to the rest of your program.

Or you use something like this:

Code: Select all

PageRequest := ComObjCreate("WinHttp.WinHttpRequest.5.1")
PageRequest.Open("GET", YourURLThatYouWantTheSourceCodeFrom, true)
PageRequest.Send()
PageRequest.WaitForResponse()
PageText := PageRequest.ResponseText
The Source-Code will then be saved inside "PageText".

Remember that inside "PageRequest.Open" you have to put your URL in quotation marks if its plain text and not a variable referencing this URL.
If you have not yet retrieved the URL, this can be done by sending Ctrl+L, Ctrl+C (and then send something like {Tab} to make sure that the window is not focused on the URL and thus could become unresponsible).
All right, so in the source code, the line I was trying to copy changed to

Code: Select all

"{count":.*,"page_info":
. Thus, I tried everything to get rid of the "illegal characters" errorbox, but none seemed to work because of the tricky situation (it's not even a classic string)

I came up with this so far:

Code: Select all

PageRequest := ComObjCreate("WinHttp.WinHttpRequest.5.1")
PageRequest.Open("GET", "https www.instagram.com /humorempire/",  Broken Link for safety true)
PageRequest.Send()
PageRequest.WaitForResponse()
PageText := PageRequest.ResponseText
msgbox %PageText%
FoundPos := RegExMatch("PageText", ""{count":.*,"page_info":")  ; Returns 4, which is the position where the match was found.
msgbox %FoundPos%

; MsgBox % SubStr("123abc789", 2, 5) ; Returns abc

Return to “Ask For Help”

Who is online

Users browsing this forum: blue83, dani0854, flyingDman, Google [Bot], keokio, Mightykiller, sttrebo and 59 guests