COM Object: HTMLFile & querySelectorAll not working Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
p3trus
Posts: 137
Joined: 02 Aug 2016, 03:20

COM Object: HTMLFile & querySelectorAll not working

21 May 2017, 07:32

Hi there,
I just wanted to code a little script to get data from a website, using COM Objects. And I'm really confused:
The examples in https://autohotkey.com/boards/viewtopic.php?p=398#p398 work fine, but the linked documentation has methods & properties which seem not to be supported - and on the other hand, I can't find document.links and document.all() which are used in the COM examples (..but they do work..)
The basic structure of my script:

Code: Select all

WebRequest := ComObjCreate("WinHttp.WinHttpRequest.5.1")
WebRequest.Open("GET", req_str)
WebRequest.Send()

document := ComObjCreate("HTMLfile")
document.write(WebRequest.ResponseText)

dtds := document.querySelectorAll("td + td")
and that last line fails with a Error: 0x80020006 - Unknown name. Specifically: querySelectorAll

when using the provided example using document.links it works, but it doesn't give me the additional info about those links I'm trying to get.

So... is the linked doc the wrong one?
And what's the preferred method of getting & analyzing a webpage right up to date, still COM or anything I've missed?

Thanks in advance!
p3trus
Posts: 137
Joined: 02 Aug 2016, 03:20

Re: COM Object: HTMLFile & querySelectorAll not working

21 May 2017, 08:25

thanks - but those linked posts are from 2009/2010, the info I had used is from late 2013.

So is ComObjCreate("InternetExplorer.Application") really preferable over the COMs I've used, does it make life easier for simple data extraction?
User avatar
tank
Posts: 2821
Joined: 28 Sep 2013, 22:15
Facebook: charlie.simmons.7334
Google: ttnnkkrr
GitHub: ttnnkkrr
Location: Irving TX
Contact:

Re: COM Object: HTMLFile & querySelectorAll not working

21 May 2017, 09:00

It is only easier/preferable if it works better
We are troubled on every side‚ yet not distressed; we are perplexed‚
but not in despair; Persecuted‚ but not forsaken; cast down‚ but not destroyed;
https://www.facebook.com/ahkscript.org
If you have forum suggestions please submit a pull request
Check Out WebWriter
Thanks Tank :thumbup:
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: COM Object: HTMLFile & querySelectorAll not working  Topic is solved

21 May 2017, 09:08

It might be related to this:

Why getElementsByClassName doesn't work for html file? - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=31907
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
p3trus
Posts: 137
Joined: 02 Aug 2016, 03:20

Re: COM Object: HTMLFile & querySelectorAll not working

21 May 2017, 10:03

that really did the trick @jeeswg - thanks!
...simply writing <meta http-equiv="X-UA-Compatible" content="IE=edge"> to the document first :crazy:
sancarn
Posts: 224
Joined: 01 Mar 2016, 14:52

Re: COM Object: HTMLFile & querySelectorAll not working

25 Aug 2017, 12:33

jeeswg wrote:It might be related to this:

Why getElementsByClassName doesn't work for html file? - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=31907
I can't find where but I remember seeing you asking where documentation was for htmlfile methods. It appears to me, from research today, that the best source of available methods etc. of the HTML COM object can be found in the MSDN docs for IHTMLDocument to IHTMLDocument8.

It also seems that querySelector and querySelectorAll are implemented in the IElementSelector interface so to solve this issue it seems you might have to somehow get the IElementSelector interface of the node you are trying to query and then call the getElementSelector(node).querySelectorAll("my > selector", outvar)

Edit: One might also be able to create an own querySelectorAll function from the IHTMLElement2.msMatchesSelector function. Also see IHTMLElement2.
User avatar
tank
Posts: 2821
Joined: 28 Sep 2013, 22:15
Facebook: charlie.simmons.7334
Google: ttnnkkrr
GitHub: ttnnkkrr
Location: Irving TX
Contact:

Re: COM Object: HTMLFile & querySelectorAll not working

25 Aug 2017, 13:21

document.close
We are troubled on every side‚ yet not distressed; we are perplexed‚
but not in despair; Persecuted‚ but not forsaken; cast down‚ but not destroyed;
https://www.facebook.com/ahkscript.org
If you have forum suggestions please submit a pull request
Check Out WebWriter
Thanks Tank :thumbup:
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: COM Object: HTMLFile & querySelectorAll not working

25 Aug 2017, 13:24

I mention here that I was looking for info about HTMLFile objects:
Why getElementsByClassName doesn't work for html file? - AutoHotkey Community
https://autohotkey.com/boards/viewtopic ... 39#p148939
[Found by looking through txt file backups of my posts.]

I see it as 2 issues now:
- It appears that HTMLFile objects can do whatever COM in Internet Explorer can do, as long as you set the version number. Thus primarily I wouldn't need to find separate documentation, only documentation relevant to either.
- It would be interesting to find explicit mention of, and good documentation for, HTMLFile objects. It seems odd how difficult this has been.

In terms of finding explicit mentions of HTMLFile, using some of this info might help, i.e. as search terms:

Code: Select all

q:: ;HTMLFile object - get info (tested on Windows 7)
;based on:
;ComObjType()
;https://autohotkey.com/docs/commands/ComObjType.htm
;oHTML := ComObjCreate("HTMLfile")
ComObject := ComObjCreate("HTMLfile")
MsgBox, % VarType := ComObjType(ComObject)
MsgBox, % IName   := ComObjType(ComObject, "Name")
MsgBox, % IID     := ComObjType(ComObject, "IID")
MsgBox, % CName   := ComObjType(ComObject, "Class")  ; v1.1.26+
MsgBox, % CLSID   := ComObjType(ComObject, "CLSID")  ; v1.1.26+
;oHTML := ""
ComObject := ""

;VarType: 9
;IName: DispHTMLDocument
;IID: {3050F55F-98B5-11CF-BB82-00AA00BDCE0B}
;CName: HTMLDocument
;CLSID: {25336920-03F9-11CF-8FD0-00AA00686F13}
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
sancarn
Posts: 224
Joined: 01 Mar 2016, 14:52

Re: COM Object: HTMLFile & querySelectorAll not working

04 Sep 2017, 02:56

jeeswg wrote:

Code: Select all

;VarType: 9
;IName: DispHTMLDocument
;IID: {3050F55F-98B5-11CF-BB82-00AA00BDCE0B}
;CName: HTMLDocument
;CLSID: {25336920-03F9-11CF-8FD0-00AA00686F13}
Cool stuff. Also worth noting the registry information points directly to Internet Explorer:


Code: Select all

HKEY_CLASSES_ROOT\htmlfile : {
	Name			Type		Data
	(Default)		REG_SZ		HTML Document
	AppUserModeIID		REG_SZ		Microsoft.InternetExplorer.Default
	EditFlags		REG_DWORD	0x00200000
	FriendlyTypeName	REG_SZ		@C:\Windows\System32\ieframe.dll,-912
}
So it appears to reference IE directly also...

I also found a stack overflow question indicating that the HTMLInterface installed on most machines is just outdated... Supposedly you should import the newest version of the interface using Tlbimp c:\windows\system32\mshtml.tlb then I guess one can use TypeLib2AHK to use the type library within AHK...

Edit: Reading a little more, type libraries (*.tlb) may only be useful for .NET applications. Perhaps this is an occasion that RAW COM is required...

Edit: Bingo! So I remembered seeing a script in the past, looked around and found this script:

http://www.autohotkey.com/board/topic/7 ... om-members

So running it on html file gives the following list:

Code: Select all

#	ID  	Name	Kind
1	-501	bgColor	[get]	
2	-501	bgColor	[put]	
3	1001	Script	[get]	
4	1003	all	[get]	
5	1004	body	[get]	
6	1005	activeElement	[get]	
7	1007	anchors	[get]	
8	1008	applets	[get]	
9	1009	links	[get]	
10	1010	forms	[get]	
11	1011	images	[get]	
12	1012	title	[put]	
13	1012	title	[get]	
14	1013	scripts	[get]	
15	1014	designMode	[put]	
16	1014	designMode	[get]	
17	1015	embeds	[get]	
18	1017	selection	[get]	
19	1018	readyState	[get]	
20	1019	frames	[get]	
21	1021	plugins	[get]	
22	1022	alinkColor	[put]	
23	1022	alinkColor	[get]	
24	1023	vlinkColor	[put]	
25	1023	vlinkColor	[get]	
26	1024	linkColor	[get]	
27	1024	linkColor	[put]	
28	1025	url	[get]	
29	1025	url	[put]	
30	1026	location	[get]	
31	1027	referrer	[get]	
32	1028	lastModified	[get]	
33	1029	domain	[put]	
34	1029	domain	[get]	
35	1030	cookie	[get]	
36	1030	cookie	[put]	
37	1031	expando	[get]	
38	1031	expando	[put]	
39	1032	charset	[put]	
40	1032	charset	[get]	
41	1033	defaultCharset	[get]	
42	1033	defaultCharset	[put]	
43	1034	parentWindow	[get]	
44	1041	mimeType	[get]	
45	1042	fileSize	[get]	
46	1043	fileCreatedDate	[get]	
47	1044	fileModifiedDate	[get]	
48	1045	fileUpdatedDate	[get]	
49	1046	security	[get]	
50	1046	security	[get]	
51	1047	protocol	[get]	
52	1047	protocol	[get]	
53	1048	nameProp	[method]	
54	1048	nameProp	[get]	
55	1054	write	[method]	
56	1055	writeln	[method]	
57	1056	open	[method]	
58	1057	close	[method]	
59	1058	clear	[method]	
60	1059	queryCommandSupported	[method]	
61	1060	queryCommandEnabled	[method]	
62	1061	queryCommandState	[method]	
63	1061	queryCommandState	[method]	
64	1062	queryCommandIndeterm	[method]	
65	1063	queryCommandText	[method]	
66	1064	queryCommandValue	[method]	
67	1065	execCommand	[method]	
68	1066	execCommandShowHelp	[method]	
69	1066	execCommandShowHelp	[method]	
70	1067	createElement	[method]	
71	1067	createElement	[method]	
72	1068	elementFromPoint	[method]	
73	1068	elementFromPoint	[method]	
74	1069	styleSheets	[get]	
75	1070	toString	[method]	
76	1071	createStyleSheet	[method]	
77	1072	releaseCapture	[method]	
78	1073	recalc	[method]	
79	1073	recalc	[method]	
80	1074	createTextNode	[method]	
81	1074	createTextNode	[get]	
82	1075	documentElement	[get]	
83	1075	documentElement	[put]	
84	1075	documentElement	[get]	
85	1076	createDocumentFragment	[method]	
86	1076	createDocumentFragment	[get]	
87	1077	uniqueID	[get]	
88	1077	uniqueID	[get]	
89	1078	parentDocument	[get]	
90	1078	parentDocument	[get]	
91	1079	enableDownload	[get]	
92	1079	enableDownload	[put]	
93	1079	enableDownload	[get]	
94	1080	baseUrl	[put]	
95	1080	baseUrl	[get]	
96	1082	inheritStyleSheets	[put]	
97	1082	inheritStyleSheets	[get]	
98	1086	getElementsByName	[method]	
99	1087	getElementsByTagName	[method]	
100	1088	getElementById	[method]	
101	1089	focus	[method]	
102	1090	hasFocus	[method]	
103	1091	namespaces	[get]	
104	1092	createDocumentFromUrl	[method]	
105	1093	media	[get]	
106	1093	media	[put]	
107	1094	CreateEventObject	[method]	
108	1095	FireEvent	[method]	
109	1096	createRenderStyle	[method]	
110	1097	URLUnencoded	[get]	
111	1098	doctype	[get]	
112	1099	implementation	[get]	
113	1100	createAttribute	[method]	
114	1101	createComment	[method]	
115	1102	compatMode	[get]	
116	1103	compatible	[get]	
117	1104	documentMode	[get]	
118	1105	querySelector	[method]	
119	1106	querySelectorAll	[method]	
120	1107	ie8_getElementById	[method]	
121	1108	createEvent	[method]	
122	1109	updateSettings	[method]	
123	1110	defaultView	[get]	
124	1111	createRange	[method]	
125	1112	getSelection	[method]	
126	1113	getElementsByTagNameNS	[method]	
127	1113	getElementsByTagNameNS	[get]	
128	1114	createElementNS	[method]	
129	1115	createAttributeNS	[method]	
130	1116	rootElement	[get]	
131	1117	characterSet	[get]	
132	1118	ie9_createElement	[get]	
133	1118	ie9_createElement	[method]	
134	1119	ie9_createAttribute	[get]	
135	1119	ie9_createAttribute	[method]	
136	1120	getElementsByClassName	[put]	
137	1120	getElementsByClassName	[method]	
138	1120	getElementsByClassName	[get]	
139	1121	createNodeIterator	[method]	
140	1121	createNodeIterator	[method]	
141	1122	createTreeWalker	[method]	
142	1122	createTreeWalker	[method]	
143	1123	createCDATASection	[method]	
144	1123	createCDATASection	[method]	
145	1124	createProcessingInstruction	[method]	
146	1124	createProcessingInstruction	[method]	
147	1125	adoptNode	[method]	
148	1125	adoptNode	[method]	
149	1126	ie9_all	[get]	
150	1126	ie9_all	[method]	
151	1127	inputEncoding	[put]	
152	1127	inputEncoding	[get]	
153	1127	inputEncoding	[get]	
154	1128	xmlEncoding	[get]	
155	1129	xmlStandalone	[put]	
156	1129	xmlStandalone	[get]	
157	1130	xmlVersion	[get]	
158	1130	xmlVersion	[put]	
159	1132	hasAttributes	[method]	
160	1134	normalize	[method]	
161	1135	importNode	[method]	
162	1136	ie9_parentWindow	[get]	
163	1137	ie9_body	[putref]	
164	1137	ie9_body	[get]	
165	1138	head	[get]	
166	1139	elementsFromPoint	[method]	
167	1140	elementsFromRect	[method]	
168	1141	msCapsLockWarningOff	[put]	
169	1141	msCapsLockWarningOff	[get]
Ultimately it appears that HTMLFile IS IHTMLDocument7/IHTMLDocument8 interface which also extends all previous versions . So by default it should contain all methods including querySelector (#118) so either 1 - we'd need to use RAW COM or 2 - we need to find the correct interface...
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: COM Object: HTMLFile & querySelectorAll not working

04 Sep 2017, 13:50

Great work sancarn. Does it make any difference using that line of code (the 'X-UA-Compatible' one) that seems to 'update' the object and make newer methods accessible? Also, did you get a different list of results compared to that link from the archived forum?

Here is the code from earlier modified to create one string, tested on Internet Explorer 11:

Code: Select all

q:: ;HTMLFile object - get info (tested on Windows 7)
WinGet, hWnd, ID, A
;ComObjType()
;https://autohotkey.com/docs/commands/ComObjType.htm
;oHTML := ComObjCreate("HTMLFile")
;oWB := WBGet("ahk_id " hWnd)
;ComObject := ComObjCreate("HTMLFile")
ComObject := WBGet("ahk_id " hWnd)
vOutput := "VarType: " ComObjType(ComObject)
. "`r`n" "IName: " ComObjType(ComObject, "Name")
. "`r`n" "IID: " ComObjType(ComObject, "IID")
. "`r`n" "CName: " ComObjType(ComObject, "Class")  ; v1.1.26+
. "`r`n" "CLSID: " ComObjType(ComObject, "CLSID")  ; v1.1.26+
ComObject := ""
Clipboard := vOutput
MsgBox, % "done"

;HTMLfile object:
;VarType: 9
;IName: DispHTMLDocument
;IID: {3050F55F-98B5-11CF-BB82-00AA00BDCE0B}
;CName: HTMLDocument
;CLSID: {25336920-03F9-11CF-8FD0-00AA00686F13}

;from Internet Explorer control:
;VarType: 9
;IName: IWebBrowser2
;IID: {D30C1661-CDAF-11D0-8A3E-00C04FC9E26E}
;CName: WebBrowser
;CLSID: {8856F961-340A-11D0-A96B-00C04FD705A2}
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
sancarn
Posts: 224
Joined: 01 Mar 2016, 14:52

Re: COM Object: HTMLFile & querySelectorAll not working

04 Sep 2017, 19:02

jeeswg wrote:Does it make any difference using that line of code (the 'X-UA-Compatible' one) that seems to 'update' the object and make newer methods accessible?
All I know is that seems to allow getElemntsByClassName but you still can't access querySelector. Will test tomorrow to see if the class methods change but I doubt they do at all... Wouldn't make much sense if they did, but no harm in testing.

Return to “Ask For Help”

Who is online

Users browsing this forum: flyingDman, Google [Bot], howardb1, J170, SirSocks, TAC109 and 227 guests