Page 13 of 34

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 05 Jun 2020, 06:50
by AndrewHunnibal
dominicx wrote:
03 Apr 2020, 08:38
This library is great! :). I have one question which I cannot figure out how to solve.

Code: Select all

IfNotExist, %ChromeProfileDirectory%
	FileCreateDir, ChromeProfile

if (Chromes := Chrome.FindInstances())
	ChromeInst := {"base": Chrome, "DebugPort": Chromes.MinIndex()}	; or if you know the port:  ChromeInst := {"base": Chrome, "DebugPort": 9222}
Page.Call("Page.navigate", {"url": url})			; Navigate to url
Page.WaitForLoad()	
If Chrome in the debug mode is already open,above commands are performed in the background without activating Chrome.

How do find Chrome debug mode and use something like WinActivate to activate it?

Code: Select all

FindInstances()
	{
static Needle := "--remote-debugging-port=(\d+)"
		Out := {}
		for Item in ComObjGet("winmgmts:")
			.ExecQuery("SELECT CommandLine FROM Win32_Process"
			. " WHERE Name = 'chrome.exe'")
			if RegExMatch(Item.CommandLine, Needle, Match)
				Out[Match1] := Item.CommandLine
		return Out.MaxIndex() ? Out : False
		}
FindInstance() from the library is going through winmgmts but info from there is useless for the WinActivate because it does not retrieve anything that can be used with WinActivate

I have tried this to see if any attribute for chrome debug would be available in WinGet but none are useful either

Code: Select all

WinGet, ID, list, ahk_class Chrome_WidgetWin_1
Loop, %id%
{
	this_id := id%A_Index%
	WinActivate, ahk_id %this_id%
	WinGetClass, this_class, ahk_id %this_id%
	WinGetTitle, this_title, ahk_id %this_id%
	MsgBox, 4, , Visiting All Windows`n%a_index% of %id%`nahk_id %this_id%`nahk_class %this_class%`n%this_title%`n`nContinue?
	IfMsgBox, NO, break
} 

Please help :)
Was having the same issue trying to activate a tab.

In your example, after
Page.Call("Page.navigate", {"url": url})
Add
Page.Call("Page.bringToFront")

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 12 Jun 2020, 17:27
by JoeWinograd
Chrome.ahk v1.2 - latest version from GitHub (9-May-2018)
W10 64-bit V1909
AHK U64 v1.1.32.00

Hi GeekDude,

First, I want to thank you for this library...great stuff!

I'm hoping that you can help me with a problem. I used your ExportPDF.ahk example to create this function (so that I can use it easily and often in scripts going forward):

Code: Select all

ChromeSavePDF(ProfileFolder,Link,FileName,Open:=False)
{
  ; huge thanks to GeekDude! (https://www.autohotkey.com/boards/memberlist.php?mode=viewprofile&u=161)
  ; https://github.com/G33kDude/Chrome.ahk
  ; https://www.autohotkey.com/boards/viewtopic.php?f=6&t=42890
  ; function by JoeW based on GeekDude's example: ExportPDF.ahk
  RetCode:=0
  ChromeInst:=new Chrome(ProfileFolder,Link,"--headless")
  If !(PageInst:=ChromeInst.GetPage())
  {
    ChromeInst.Kill()
    Return 1
  }
  Else
  {
    PageInst.WaitForLoad()
    Base64PDF:=PageInst.Call("Page.printToPDF").data
    Size:=Base64_Decode(BinaryPDF, Base64PDF)
    FileOpen(FileName,"w").RawWrite(BinaryPDF,Size)
    If (Open)
    {
      Try Run,%FileName%
      Catch
        RetCode:=2
    }
    Try PageInst.Call("Browser.close") ; Fails when running headless
    Catch
      ChromeInst.Kill()
    PageInst.Disconnect()
  }
  Return RetCode
}
#Include Chrome.ahk
#Include Base64EncodeDecode.ahk
The Chrome.ahk #Include file is an unmodified copy of yours, and the Base64_Encode and Base64_Decode functions are also unmodified copies of yours, which I put in a separate #Include file.

Here's the problem. Using Chrome.ahk, the PDF contains links to images instead of embedded images. For example, here's a screenshot of a portion of the PDF that shows the issue:

chromeahk links not images.png
chromeahk links not images.png (29.2 KiB) Viewed 9722 times

Loading the same HTML file into Chrome (not headless) and doing a Print>Save As PDF results in the images (not links). For example, here's a screenshot of the same area of the PDF shown above:

chrome not headless Print-Save As PDF.png
chrome not headless Print-Save As PDF.png (218.43 KiB) Viewed 9722 times

I'm trying to use Chrome.ahk to download a large number of articles that I've published at a website (perfectly legal, btw). It would be very time-consuming to do all of them manually via Print>Save As PDF in non-headless Chrome.

What's really strange is that Chrome.ahk works perfectly on many of the articles, some with a very large number of images, yet it fails on others.

If it matters (I don't think it should), the "Link" being fed to Chrome.ahk is the full file path of an HTML file on the local machine (not a web URL), which has been downloaded by my master AHK script via UrlDownloadToFile. I did it that way because I need to tweak the HTML before feeding it to Chrome.ahk, but that shouldn't matter, as the key point is that the Chrome.ahk output is different from the non-headless Chrome Print>Save As PDF output. Also, to make sure that the problem is NOT because of the local, tweaked HTML file, I ran a Chrome.ahk test on the web URL itself and had the same problem. If you want to try it yourself, here's the URL:
https://www.experts-exchange.com/articles/11952/How-to-Embed-Screenshots-in-Posts.html?printer=true

Thanks much for your help in resolving this. Regards, Joe

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 12 Jun 2020, 19:50
by geek
JoeWinograd wrote: Hi GeekDude,
Hi Joe!
JoeWinograd wrote: First, I want to thank you for this library...great stuff!
You're welcome!

JoeWinograd wrote: I'm hoping that you can help me with a problem. I used your ExportPDF.ahk example to create this function (so that I can use it easily and often in scripts going forward):

...

Here's the problem. Using Chrome.ahk, the PDF contains links to images instead of embedded images.

...

What's really strange is that Chrome.ahk works perfectly on many of the articles, some with a very large number of images, yet it fails on others.

...

If you want to try it yourself, here's the URL:
https://www.experts-exchange.com/articles/11952/How-to-Embed-Screenshots-in-Posts.html?printer=true

Thanks much for your help in resolving this. Regards, Joe
It looks like that page employs a technique called "lazy loading", where it keeps the browser from fetching the image file until it becomes visible (or some other critera). I'm not sure what, if anything, you can do with Chrome.ahk to get around this, as unless something's changed this API function is only available when Chrome is headless.

However, this API is not the only way to automatically generate a PDF using Chrome. There's a command line flag you can give Chrome to tell it to make a PDF in the background which may be more useful to you here: --print-to-pdf . Maybe give it a try (e.g. Run, "C:\Path\To\Chrome.exe" --headless --print-to-pdf="C:\Path\To\Pdf.pdf" https://path.to/webpage.html) and see if it works any better for your purposes.

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 12 Jun 2020, 21:37
by JoeWinograd
Hi GeekDude,

Thanks for the fast reply...much appreciated!
GeekDude wrote:It looks like that page employs a technique called "lazy loading"
I'm not familiar with that, but what you're saying sounds like a plausible explanation.
GeekDude wrote:--print-to-pdf
Very good thought! Unfortunately, no joy. In fact, what's really interesting is that it does worse than Chrome.ahk! With Chrome.ahk, two of the images in the article get links; with command line headless --print-to-pdf, three of the images in the article get links. Go figure.

I tried another approach...a script that sends keystrokes to a visible Chrome (Alt+fp, Enter, etc.), but, like many such scripts, it is highly unreliable, and I got tired of playing with WinWaitActive and Sleep commands. :)

I suppose at this point I'll simply have to do it manually in a visible Chrome. But thanks for trying...I appreciate it! Stay safe, Joe

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 17:38
by teadrinker
JoeWinograd wrote:

Code: Select all

    PageInst.WaitForLoad()
    Base64PDF:=PageInst.Call("Page.printToPDF").data
Hi Joe
I think, the images just do not have enough time to render. Try adding Sleep, 1000 after PageInst.WaitForLoad().

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 18:34
by JoeWinograd
teadrinker wrote:Try adding Sleep, 1000 after PageInst.WaitForLoad().
Hi teadrinker,
Tried it...no joy. Then tried 10000...still no joy. But thanks for the idea...was worth a shot. Regards, Joe

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 18:40
by teadrinker
Ok, try this:

Code: Select all

PageInst.WaitForLoad()
Sleep, 1000
PageInst.Evaluate("window.scrollTo(0,document.body.scrollHeight);")
Sleep, 1000
Base64PDF := PageInst.Call("Page.printToPDF").data

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 19:22
by JoeWinograd
teadrinker wrote:

Code: Select all

Sleep, 1000
PageInst.Evaluate("window.scrollTo(0,document.body.scrollHeight);")
Sleep, 1000
Ah, that works! Brilliant! I don't understand why the PageInst.Evaluate call fixes it, but it certainly does! Btw, testing shows that it also works without the first Sleep, but does not work without the second one.

Thanks very much, teadrinker, for your persistence in finding a solution...I really appreciate it! And thanks, once again, to GeekDude for the library.

Cheers, Joe

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 19:24
by teadrinker
Also, this may work:

Code: Select all

PageInst.Evaluate("const imgs = document.getElementsByTagName('img'); for (let i = 0; i < imgs.length; i++) imgs[i].scrollIntoView();")
Sleep, 1000
:wave:

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 19:35
by JoeWinograd
teadrinker wrote:Also, this may work
Indeed, it does! Frankly, I don't understand what either PageInst.Evaluate call does...is there a reason why I should prefer one over the other? Which one do you suggest that I use? Thanks, Joe

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 19:48
by teadrinker
PageInst.Evaluate runs Javascript in the page context. In the first case it scrolls the page down, so all images should appear regardless "lazy loading". Second one scrolls all images sequentially. However, both of them may sometimes not work. I'll think what method is better, if I find, I'll write here.

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 20:32
by JoeWinograd
Hi teadrinker,
Thanks for the update...looking forward to anything that you discover. I've run only small tests so far. I'm going to do a full run soon...63 articles! I'll try all 63 with both PageInst.Evaluate calls and will post the results here...will be interesting to see if one works better than the other. Regards, Joe

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 20:42
by teadrinker
Perhaps this one is preferable:

Code: Select all

js =
(
function scrollPage() {
  window.scrollBy(0, 200);
  const timeout = setTimeout('scrollPage()', 100);
  if ((window.innerHeight + window.pageYOffset) >= document.body.offsetHeight)
    clearTimeout(timeout);
}
scrollPage();
)
PageInst.Evaluate(js)

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 15 Jun 2020, 23:01
by JoeWinograd
Why do you think that it may be preferable?

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 16 Jun 2020, 08:45
by teadrinker
This code scrolls the page from top to bottom with slight delays. This gives all pictures time to load.

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 17 Jun 2020, 13:23
by JoeWinograd
teadrinker wrote:This code scrolls the page from top to bottom with slight delays. This gives all pictures time to load.
Hi teadrinker,
Thanks for explaining that.

Here are my results so far. I'll refer to the three different calls this way:

Eval1
PageInst.Evaluate("window.scrollTo(0,document.body.scrollHeight);")

Eval2 (I changed i to j so it doesn't generate the italics BBCode)
PageInst.Evaluate("const imgs = document.getElementsByTagName('img'); for (let j = 0; j < imgs.length; j++) imgs[j].scrollIntoView();")

Eval3
PageInst.Evaluate(js)

The elapsed times on all three runs for all 63 articles were virtually identical...Eval1 was 5:04, Eval2 was 5:02, and Eval3 was 5:04, but considering other activity on the computer, the two-second difference is not meaningful. There is a startup/initialization time such that doing only one article takes 28 seconds, meaning it takes about 4.4 seconds per article not including startup/initialization time. But this is not just for Chrome.ahk...it includes downloading the article via UrlDownloadToFile and tweaking the downloaded HTML...then Chrome.ahk operates on the tweaked HTML to create the PDF.

I haven't reviewed all 189 PDFs yet, but I did look at several of them that have been problematic in the past. The Eval1 and Eval2 runs did not fail on any of those, but the Eval3 run did have a failure...one of the articles resulted in four embedded images but three links instead of images.

I'll post full results after I've carefully inspected all 189 PDFs. Regards, Joe

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 17 Jun 2020, 13:29
by teadrinker
JoeWinograd wrote: The Eval1 and Eval2 runs did not fail on any of those, but the Eval3 run did have a failure
This is weird. :)

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 17 Jun 2020, 16:20
by JoeWinograd
teadrinker wrote:This is weird.
And now for more weirdness...just found one file where all three Eval calls fail, although Eval3 performs the best. Eval1 and Eval2 both generate 13 links instead of images, while Eval3 generates only 8. Go figure. Regards, Joe

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 17 Jun 2020, 16:30
by JoeWinograd
The weirdness continues...I increased the Sleep time from 1000 to 5000. With that, Eval1 and Eval2 both generate 12 links on the article mentioned above, while...drum roll, please...Eval3 works perfectly! 19 embedded images...no links!

Re: [Library] Chrome.ahk - Automate Google Chrome using native AutoHotkey. No Selenium!

Posted: 18 Jun 2020, 06:12
by teadrinker
You can try to increase also this value: const timeout = setTimeout('scrollPage()', 100);.