Page 1 of 4

Web Scraping with AutoHotkey & COM Tutorial- GUI syntax writer and demo videos

Posted: 24 May 2015, 20:12
by Joe Glines
I created an AutoHotKey script that helps writing AutoHotKey syntax for Web Scraping with AutoHotkey.

YouTube Demonstration videos:
1) Intro- Pointer, Get values and Page Navigation
1.5) Intro- Troubleshooting & Getting correct content from page
2) Intro- Set values & clicks / Buttons
3) Itermediate- Isolating area and leveraging DOM/HTML
4) Advanced- Dealing with Frames
5) Intro- Troubleshooting tips
6) Intermediate- Loop over pages & extract data
7) Intermediate- Web scraping using ClassName
8) Intermediate- Web scraping using QuerySelector and QuerySelectorAll
9) Intro- Webinar on Intro to Web Scraping :superhappy:
10) Intro- Update to Web Scraping syntax writer
11) Intermediate-EventListners & Triggering Events
12) Intermediate-Saving files / Pictures from a URL / Hyperlink
13) Intro-Review of Web scraping tools
14) Intermediate- Passing Method or Property to COM in a function
15) Intermediate- Intermediate  Extracting data from a table by walking the DOM


Also check out my other tutorials on using Selenium with AutoHotkey (this allows for scraping with Chrome, FireFox, IE, etc.) and using Chrome.ahk from GeekDude to Get and Set values.

Manipulating the Document Object Model in Javascript is a good video talking through the DOM from O'Reilly

I highly recommend using Fiddler to help monitor network traffic. Check out this page where I have some videos showing how to use Fiddler ot monitor network traffic.

Here is where you can get the source code as well as compiled version of my AutoHotkey Web Scraping Syntax Writer

Videos and scripts to Login to Websites:
  1. Login to Facebook This first video is much longer & in-depth! I cover many of the reasons why I pick one method over another. I also have HellBent sit-in and ask questions so it should be a great starting point for noobs to Web Scraping.
  2. Login to Amazon
  3. Login to LinkedIn
  4. Login to Gmail / Google / YouTube
  5. Login to Pinterest
  6. Login to Twitter
  7. Login to Reddit

Examples of work automated via Web Scraping with AutoHotkey
  1. Submit StumbleUpon submissions
  2. Transfer data from one website/system to another
  3. How I exported over 4 million contacts from Lexis Nexis
  4. Extract status from SharePoint and email colleagues
  5. Select “x” number of items on website form
  6. Obtain Behavioral Targeting Data from your own Web Site
  7. Determine House status on Real Estate site
  8. Extract meta data about videos from website
  9. Automating saving invoices on Amazon for taxes
  10. Waiting for an element to be visible before clicking (this is using FindText instead of COM)
Comparision of Web Scraping to API calls

Re: Intro to WebScraping and COM

Posted: 25 May 2015, 01:37
by jethrow
Nice - should help make IE Com stuff way easy for beginners. Plus videos are good - & I felt way famous after watching ...

... a couple things...
  • The "M" in COM & DOM stands for Model
  • False=0 & 0!=-1 meaning -1=True
  • I prefix raw pointers with "p" to signify it's a pointer (pwb, pdoc, etc.) - outside of the raw pointers in the WBGet() function, you aren't using any raw pointers in your script - only wrapped COM objects. Not that you have to follow my naming conventions, just sayin ...
  • iWB2 Learner FRAME.# should be interpreted as FRAME.DEPTH
I'm interested to see your next videos - if they're good, I'll likely link them in my tutorial.

Re: Intro to WebScraping and COM

Posted: 25 May 2015, 07:06
by Joe Glines
Thanks for pointing out my inaccuracies! :)

And you, Mickers, Tank, Blackholyman, Sinkafaze, Lexikos, Sean (and I'm sure many others) ARE famous in my eyes as you have all greatly helped me and countless others!

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 07 Jun 2015, 12:49
by AmirOulad
Never mind,

Stupid computer with an .dll error.

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 07 Jun 2015, 14:14
by jethrow
AHK has some syntax designs that don't translate well into other languages. A good example is in AHK, the following 2 calls are the same:

Code: Select all

object.key
object["key"]
That being said, if you are focusing on web-scraping & tutorials, I'd highly recommend making your code easily translatable to jscript/javascript.

In your video you use:

Code: Select all

parentWindow.frames.2.0.document.location.href
This does not work in javascript:

Code: Select all

;// incorrect:
javascript: alert(window.frames.2.0.document.location.href)
;// correct:
javascript: alert(window.frames[2][0].document.location.href)
Note that the correct javascript syntax also works in AHK:

Code: Select all

parentWindow.frames[2][0].document.location.href
Another situation that has been frustrating for me when going to other languages is that AHK will allow you to call COM methods without using the parenthesis:

Code: Select all

shell := ComObjCreate("Shell.Application")
;// windows method w/ parenthesis - arguably more proper
MsgBox % shell.windows().count
;// windows method w/o parenthesis - still works
MsgBox % shell.windows.count
Note the difference in jscript:

Code: Select all

var shell = new ActiveXObject("Shell.Application")
;// windows method w/ parenthesis
WScript.echo( shell.windows().count )
;// windows method w/o parenthesis - Error: Object doesn't support this property or method
WScript.echo( shell.windows.count )
Again, with your personal coding, you can of course do whatever works. But, if you're creating a code creation tool & tutorials, I'd highly recommend doing object member syntax so it works in other comparable languages as well.

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 07 Jun 2015, 15:44
by Joe Glines
Thanks for the edification jethrow! While I can fumble through things, I'm definitely not the right person to be giving advice on best practices!

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 12 Sep 2015, 17:17
by Soft
Very useful for me XD

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 12 Sep 2015, 18:20
by Joe Glines
Thanks! I'd stopped doing more as this didn't get the traffic that I was hoping for. :(

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 17 Sep 2015, 14:08
by boris321
[quote="Joe_Glines_Joetazz"]I created an AutoHotKey script that helps writing AutoHotKey syntax for WebScraping.

"I've also created a demo video talking though how to use it. Right now I'm thinking I'll have at least 3 videos but we'll see how bored I get..."

I found them useful. I have been wanting to know how to do this for years. Thank you!

Just a quick question, the inclusion of:

#Persistent
#SingleInstance Force
#NoEnv

Do they need to go in a specific folder to make the .ahk script functional?

Thank you!
Boris

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 08:53
by subodhjoshi
Joe,
I use tabs to navigate page elements but obviously, it is severely restricted. This methods you use will make it much easier and far more powerful. Quick question - what extension do you use in your SciTe editor to get the control+left click menu that you use so extensively? (Actually, looks like you have written a script for it per your first line! Can you share it? thx.)

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 09:12
by Joe Glines
that isn't a "SciTE" thing- that is my AutoHotKey script which writes my AutoHotKey syntax. (yes that sounds confusing) but if you run the script writer, you'll then be able to control Left click and the menus will appear. I noticed on Win10 they removed a few of the icons thus the script will not run as-is. If you're on Win10 it will take some tweaks (or just simply comment out the lines that it says it cannot find the icons)

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 09:25
by Joe Glines
One more thing- while using COM has a learning curve it is light-years ahead of sending tabs! Once you get the hang of it, it is pretty easy and much, much more reliable! If you haven't done so already I highly recommend working through Jethrow's tutorial.

Another good one is on BlackHolyman's site regarding Logging into a website

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 10:23
by wolf_II
@Joe_Glines_Joetazz
Please, is there a md5 for iWB2Learner.exe available?
Or a known download location?

I might have a corrupted copy. :(

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 11:52
by Joe Glines
I'm not sure what you mean by md5 but you can download the files from here

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 12:24
by wolf_II
Joe_Glines_Joetazz wrote:I'm not sure what you mean by md5 but you can download the files from here
@Joe_Glines_Joetazz:

Yes, that's where I got it from. (Links to http://www.autohotkey.net/~rbrtryn/Appl ... earner.zip)
But I get a virus warning from Avira. Which is the first time for me. Avira is usually very good with AHK exe's.
I wonder if autohotkey.net could have been corrupted? or maybe just the zip-file?

Anyway, MD5 is a commonly used checksum, and I got this:

Code: Select all

iWB2Learner.zip     c68647261aaefbc264bf29ffcf8c26e2
iWB2 Learner.exe    609e65a6e56eb45e95c4f1930fd24704
Can anybody please confirm that this is a valid file to use?

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 13:13
by Joe Glines
It has been reported several times before as a false positive

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 13:19
by wolf_II
@Joe_Glines_Joetazz: Thank you very much.

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 26 Nov 2015, 14:21
by Joe Glines
I updated my source code above to remove icons that are not in Win10 and incorporate the use of getElementsByClassName which was introduced and explained to me by BlackHolyman. A lot of pages frequently have ClassNames and they are my "go to" method call now! :dance:

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 28 Nov 2015, 08:12
by subodhjoshi
@Joe - thx for link to Jethrow's tutorial. Seems like you are producing video version of the tutorial. Thats very helpful - thx again for your effort. So far, I have managed to check out page elements. So far so good. I need to see how I can manipulate web page, feed values and click buttons. Thats what I am after and not just scraping data from a rendered page. Eager to try out further videos above.

One problem - iWB2Learner does not work as it does in your video. It seems to 'skew' page elements when it outlines, it just misses them etc. I have IE 11 and I see same problem with iWb2Lerner downloaded from link below as well as the one on Jethrow's page. But I can get page element names from page source so while it would have been very convenient, its not a showstopper.

@Wolf_II - I downloaded iWB2Learner from sourceforge - http://sourceforge.net/projects/ahkcn/f ... 20Learner/
This seems to be newer version compared to one from Jethrow's page.

Re: WebScraping and COM- GUI syntax writer and demo videos

Posted: 28 Nov 2015, 08:52
by Joe Glines
change IE to be at 100% zoom level. It does this to me as well but should fix the issue