Reading a currently open webpage and getting data
-
- Posts: 3
- Joined: 16 Mar 2020, 03:38
Reading a currently open webpage and getting data
Hi Guys,
I am trying to use this script with a ticketing system at work, to make my job slightly easier in filling out email templates.
I basically want to press a hotkey (f5 for example) and read the google chrome webpage I currently have open and active(My current ticket), I then want to extract specific data from the webpage, like a reference number or email address.
I can do this fairly easily just by parsing the HTML code with python however I am having a hard time figuring out a way to grab the HTML info of the webpage I currently have open.
The url will be different each time I use the the script.
Can any one point me in the right direction either with autohotkey or python
Cheers,
I am trying to use this script with a ticketing system at work, to make my job slightly easier in filling out email templates.
I basically want to press a hotkey (f5 for example) and read the google chrome webpage I currently have open and active(My current ticket), I then want to extract specific data from the webpage, like a reference number or email address.
I can do this fairly easily just by parsing the HTML code with python however I am having a hard time figuring out a way to grab the HTML info of the webpage I currently have open.
The url will be different each time I use the the script.
Can any one point me in the right direction either with autohotkey or python
Cheers,
-
- Posts: 4412
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Reading a currently open webpage and getting data
Hi
Try this:
Try this:
Code: Select all
SetBatchLines, -1
js =
(
var textArea = document.createElement("textarea");
textArea.value = new XMLSerializer().serializeToString(document);
textArea.style.position = 'fixed';
document.body.appendChild(textArea);
textArea.focus();
textArea.select();
document.execCommand('copy');
textArea.parentNode.removeChild(textArea)
)
$F1::
Clipboard := ""
RunJsFromChromeAddressBar(js)
ClipWait, 3
MsgBox, % Clipboard
Return
RunJsFromChromeAddressBar(js, exe := "chrome.exe") {
static WM_GETOBJECT := 0x3D
, ROLE_SYSTEM_TEXT := 0x2A
, STATE_SYSTEM_FOCUSABLE := 0x100000
, SELFLAG_TAKEFOCUS := 0x1
window := "ahk_class Chrome_WidgetWin_1 ahk_exe " . exe
WinActivate, % window
SendMessage, WM_GETOBJECT, 0, 1, Chrome_RenderWidgetHostHWND1, % window
AccChrome := AccObjectFromWindow( WinExist(window) )
AccAddrBar := SearchElement(AccChrome, {Role: ROLE_SYSTEM_TEXT, State: STATE_SYSTEM_FOCUSABLE})
AccAddrBar.accValue(0) := "javascript:" . js
AccAddrBar.accSelect(SELFLAG_TAKEFOCUS, 0)
ControlSend,, {Enter}, % window
}
SearchElement(parentElement, params)
{
found := true
for k, v in params {
try {
if (k = "ChildCount")
(parentElement.accChildCount != v && found := false)
else if (k = "State")
(!(parentElement.accState(0) & v) && found := false)
else
(parentElement["acc" . k](0) != v && found := false)
}
catch
found := false
} until !found
if found
Return parentElement
for k, v in AccChildren(parentElement)
if obj := SearchElement(v, params)
Return obj
}
AccObjectFromWindow(hWnd, idObject = 0) {
static IID_IDispatch := "{00020400-0000-0000-C000-000000000046}"
, IID_IAccessible := "{618736E0-3C3D-11CF-810C-00AA00389B71}"
, OBJID_NATIVEOM := 0xFFFFFFF0, VT_DISPATCH := 9, F_OWNVALUE := 1
, h := DllCall("LoadLibrary", "Str", "oleacc", "Ptr")
VarSetCapacity(IID, 16), idObject &= 0xFFFFFFFF
DllCall("ole32\CLSIDFromString", "Str", idObject = OBJID_NATIVEOM ? IID_IDispatch : IID_IAccessible, "Ptr", &IID)
if DllCall("oleacc\AccessibleObjectFromWindow", "Ptr", hWnd, "UInt", idObject, "Ptr", &IID, "PtrP", pAcc) = 0
Return ComObject(VT_DISPATCH, pAcc, F_OWNVALUE)
}
AccChildren(Acc) {
static VT_DISPATCH := 9
Loop 1 {
if ComObjType(Acc, "Name") != "IAccessible" {
error := "Invalid IAccessible Object"
break
}
try cChildren := Acc.accChildCount
catch
Return ""
Children := []
VarSetCapacity(varChildren, cChildren*(8 + A_PtrSize*2), 0)
res := DllCall("oleacc\AccessibleChildren", "Ptr", ComObjValue(Acc), "Int", 0
, "Int", cChildren, "Ptr", &varChildren, "IntP", cChildren)
if (res != 0) {
error := "AccessibleChildren DllCall Failed"
break
}
Loop % cChildren {
i := (A_Index - 1)*(A_PtrSize*2 + 8)
child := NumGet(varChildren, i + 8)
Children.Push( (b := NumGet(varChildren, i) = VT_DISPATCH) ? AccQuery(child) : child )
( b && ObjRelease(child) )
}
}
if error
ErrorLevel := error
else
Return Children.MaxIndex() ? Children : ""
}
AccQuery(Acc) {
static IAccessible := "{618736e0-3c3d-11cf-810c-00aa00389b71}", VT_DISPATCH := 9, F_OWNVALUE := 1
try Return ComObject(VT_DISPATCH, ComObjQuery(Acc, IAccessible), F_OWNVALUE)
}
-
- Posts: 3
- Joined: 16 Mar 2020, 03:38
Re: Reading a currently open webpage and getting data
Would you be able to tell me how this works and what parts of the script I should be paying attention to in order to make it work the way I need . Wasn't ready for this level of complexity to be honest.
-
- Posts: 4412
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Reading a currently open webpage and getting data
To start, open your Chrome with the page you are interested in and run the script and press F1. The message box with html-code should appear, does it?
Re: Reading a currently open webpage and getting data
This is more basic/less complex approach:
Code: Select all
$F1::
Send, ^u
While (Clipboard && A_Index < 10) {
Clipboard:= ""
Sleep, 50
}
WinWaitActive view-source: ahk_exe chrome.exe
While (!Clipboard && A_Index < 10) {
Sleep, 100
Send, ^a
Sleep, 100
Send, ^c
}
Send, ^w
MsgBox % Clipboard
Return
-
- Posts: 4412
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Reading a currently open webpage and getting data
I think the first cycle is unnecessary. ![Smile :)](./images/smilies/icon_e_smile.gif)
![Smile :)](./images/smilies/icon_e_smile.gif)
Re: Reading a currently open webpage and getting data
I think too, that ClipBoard:= "" as a AutoHotkey command is generally very reliable, however in extreme cases like high CPU-usage and especially heavy multiple usage of clipboard by other apps can lead to failure nonetheless. While loop reduces that possibility furthermore drastically.
Re: Reading a currently open webpage and getting data
Your code is more reliable indeed after testing.While the former code sometimes reports error because of Googlerommmcek wrote: ↑16 Mar 2020, 19:22I think too, that ClipBoard:= "" as a AutoHotkey command is generally very reliable, however in extreme cases like high CPU-usage and especially heavy multiple usage of clipboard by other apps can lead to failure nonetheless. While loop reduces that possibility furthermore drastically.
![Crazy :crazy:](./images/smilies/icon_crazy.gif)
Re: Reading a currently open webpage and getting data
If You want speed You can select all and copy and then get html format from clipboard.
-
- Posts: 4412
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Reading a currently open webpage and getting data
If the only goal is getting html-code, perhaps my approach is not best, but it allows to run any js-script on the page and to get its result. And it's not necessary using clipboard for getting a result.
Re: Reading a currently open webpage and getting data
How You could do it without using clipboard?
Re: Reading a currently open webpage and getting data
internet radio example with vlc.exe , read html in memory and extract specific data from the webpage
( here is the url fix , but can use 'onclipboardchange' if needed to copy and change the url )
( here is the url fix , but can use 'onclipboardchange' if needed to copy and change the url )
Code: Select all
;CREATED =20130104
#warn
#NoTrayIcon
setworkingdir,%a_scriptdir%
Filename1=Internet Radio ( use ESC for MUTE=ON/OFF )
f1:="http://144.217.253.136:8582/played.html" ;- html saved in memory
f2=http://144.217.253.136:8582/ ;- run this with vlc.exe
new:=""
vlcx =%A_programfiles%\VideoLAN\VLC\vlc.exe
ifexist,%vlcx%
run,%vlcx% --one-instance --qt-start-minimized %f2%,,hide,pid1
;--- Hotkey Escape for Mute ON/OFF -----------
HK1=Esc
Hotkey,%hk1%,mute1,ON
xx:=ComObjCreate("WinHttp.WinHttpRequest.5.1") ;-Create the Object
ComObjError(false)
xx.Silent := True ;- script failure = off
xx.SetTimeouts(500,500,500,500)
Gui,2: Color, 000000
Gui,2:Font, S10 CDefault , Lucida Console
Gui,2:Add,Text, cYellow x10 y10 h23 w680 vT1,
Gui,2:Add,Text, cGray x10 y50 h170 w680 vT2
Gui,2: Show, x10 y1 w700 ,%filename1%
settimer,aa1,5000
gosub,aa1
return
;-----------------------
aa1:
new:=""
try {
xx.Open("GET",f1) ;-Open communication read html( in memory )
xx.Send() ;-Send the "get" request
aac=
aac:=xx.ResponseText ;-Set the "aac" variable to the response
;msgbox, 262208,%f1%--TEXT ,%aac% ;- see TEXT from url
} catch e {
xxx:=e.Message
msgbox, 262208,ERROR ,Error=Catch`n%f1%`n NOT exists`n------------------------------------------`n%xxx%`n------------------------------------------,
}
StringReplace,y,aac,</tr>,$, All
Loop,parse,y,$,
{
if A_loopfield contains Current Song
{
y:=RegExReplace( A_loopfield, "<.*?>" )
stringreplace,y,y,Current Song,,all
stringmid,y,y,9,200
break
}
}
y :=RegExReplace(y, "\W", " ") ;- now playing
;- all last played ---------------
StringReplace,aac,aac,</tr>,$, All
stringreplace,aac,aac, ,,all
aac:=RegExReplace(aac, "<.*?>" )
Loop,parse,aac,$,
new .= a_loopfield "`n"
;msgbox, 262208,PLAYED_SONGS,%new%
GuiControl,2:,T2,%new%
GuiControl,2:,T1 ,%y% ;-- Current Text
;msgbox, 262208, ,%y%,2
return
;--------------------
MUTE1:
soundset,+1,master,mute
return
;--------------------
2Guiclose:
settimer,aa1,OFF
Process, Exist, vlc.exe
If ErrorLevel
{
msgbox, 262435,Radio-Close,Want you close also Audio ?
ifmsgbox,NO
{
soundset,0,master,mute
exitapp
}
ifmsgbox,Cancel
{
settimer,aa1,ON
return
}
else
{
soundset,0,master,mute
process,close,vlc.exe
exitapp
}
}
else
{
soundset,0,master,mute
exitapp
}
return
;=================================================================
-
- Posts: 4412
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Reading a currently open webpage and getting data
But if the page has a lot of elements then it can take a lot of time.
I think that it can be different approach.
I think that it can be different approach.
-
- Posts: 4412
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Reading a currently open webpage and getting data
You could add that element at the beginning of the page.
Re: Reading a currently open webpage and getting data
Obviously, your script is a masterpiece.But it is so complex to be understood or modified.teadrinker wrote: ↑17 Mar 2020, 02:20If the only goal is getting html-code, perhaps my approach is not best, but it allows to run any js-script on the page and to get its result. And it's not necessary using clipboard for getting a result.
For example, I want to modify the JS-script to get the contents of a web page with an ID, knowing that I need to use getElementsById, but not knowing where to change the code.
Can you give me a hint on how to get the value of an ID based on the original code?
Any small changes will result in an error or NULL
![Hit my head against the wall. :headwall:](./images/smilies/headwall.gif)
Re: Reading a currently open webpage and getting data
If it will be faster why do You use clipboard?
Re: Reading a currently open webpage and getting data
One more option with communication: between chrome/firefox<->ahk with native messaging:
https://www.autohotkey.com/boards/viewtopic.php?t=32299
https://www.autohotkey.com/boards/viewtopic.php?t=32299
-
- Posts: 4412
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Reading a currently open webpage and getting data
A simple example. Open https://time.is/Unix_time_now in Chrome, run the script:
Every time you press F1 you will get unix time in the message box. First time it will take some time, but then it will happen without delay.
Code: Select all
SetBatchLines, -1
global window := "ahk_class Chrome_WidgetWin_1 ahk_exe chrome.exe" ; or msedge.exe
SendMessage, WM_GETOBJECT := 0x3D, 0, 1, Chrome_RenderWidgetHostHWND1, % window
js =
(
(() => {
let receiver, sender;
if (!(receiver = document.getElementById('ahk_receiver'))) {
receiver = createTextArea('ahk_receiver');
sender = createTextArea('ahk_sender');
sender.addEventListener('input', inputHandler);
}
function inputHandler(event) {
receiver.value = document.getElementById('clock').innerText;
}
function createTextArea(name) {
const textArea = document.createElement('textarea');
textArea.setAttribute('aria-label', name);
textArea.setAttribute('id', name);
textArea.style.position = 'fixed';
textArea.style.width = 0;
textArea.style.height = 0;
document.body.insertBefore(textArea, document.body.firstChild);
return textArea
}
})();
)
SendJS(js)
Return
$F1:: MsgBox, % GetInfo()
GetInfo() {
static receiver, sender
if !receiver {
receiver := GetAccElem("ahk_receiver")
sender := GetAccElem("ahk_sender")
}
sender.accValue(0) := A_TickCount
Sleep, 100
Return receiver.accValue(0)
}
SendJS(js) {
static AccAddrBar
(!AccAddrBar && AccAddrBar := GetAccElem("addr"))
AccAddrBar.accValue(0) := "javascript:" . js
AccAddrBar.accSelect(SELFLAG_TAKEFOCUS := 0x1, 0)
ControlSend,, {Enter}, % window
}
GetAccElem(elem) {
static ROLE_SYSTEM_TEXT := 0x2A, STATE_SYSTEM_FOCUSABLE := 0x100000, accChrome
, Elems := { addr: {Role: ROLE_SYSTEM_TEXT, State: STATE_SYSTEM_FOCUSABLE}
, ahk_receiver: {Role: ROLE_SYSTEM_TEXT, Name: "ahk_receiver"}
, ahk_sender: {Role: ROLE_SYSTEM_TEXT, Name: "ahk_sender"} }
if !accChrome
accChrome := AccObjectFromWindow( WinExist(window) )
Return SearchElement(accChrome, Elems[elem])
}
AccObjectFromWindow(hWnd, idObject = 0) {
static IID_IDispatch := "{00020400-0000-0000-C000-000000000046}"
, IID_IAccessible := "{618736E0-3C3D-11CF-810C-00AA00389B71}"
, OBJID_NATIVEOM := 0xFFFFFFF0, VT_DISPATCH := 9, F_OWNVALUE := 1
, h := DllCall("LoadLibrary", "Str", "oleacc", "Ptr")
VarSetCapacity(IID, 16), idObject &= 0xFFFFFFFF
DllCall("ole32\CLSIDFromString", "Str", idObject = OBJID_NATIVEOM ? IID_IDispatch : IID_IAccessible, "Ptr", &IID)
if DllCall("oleacc\AccessibleObjectFromWindow", "Ptr", hWnd, "UInt", idObject, "Ptr", &IID, "PtrP", pAcc) = 0
Return ComObject(VT_DISPATCH, pAcc, F_OWNVALUE)
}
SearchElement(parentElement, params)
{
found := true
for k, v in params {
try {
if (k = "State")
(!(parentElement.accState(0) & v) && found := false)
else if (k ~= "^(Name|Value)$")
(!(parentElement["acc" . k](0) ~= v) && found := false)
else if (k = "ChildCount")
(parentElement["acc" . k] != v && found := false)
else
(parentElement["acc" . k](0) != v && found := false)
}
catch
found := false
} until !found
if found
Return parentElement
for k, v in AccChildren(parentElement)
if obj := SearchElement(v, params)
Return obj
}
AccChildren(Acc) {
static VT_DISPATCH := 9
Loop 1 {
if ComObjType(Acc, "Name") != "IAccessible" {
error := "Invalid IAccessible Object"
break
}
try cChildren := Acc.accChildCount
catch
Return ""
Children := []
VarSetCapacity(varChildren, cChildren*(8 + A_PtrSize*2), 0)
res := DllCall("oleacc\AccessibleChildren", "Ptr", ComObjValue(Acc), "Int", 0
, "Int", cChildren, "Ptr", &varChildren, "IntP", cChildren)
if (res != 0) {
error := "AccessibleChildren DllCall Failed"
break
}
Loop % cChildren {
i := (A_Index - 1)*(A_PtrSize*2 + 8)
child := NumGet(varChildren, i + 8)
Children.Push( (b := NumGet(varChildren, i) = VT_DISPATCH) ? AccQuery(child) : child )
( b && ObjRelease(child) )
}
}
if error
ErrorLevel := error
else
Return Children.MaxIndex() ? Children : ""
}
AccQuery(Acc) {
static IAccessible := "{618736e0-3c3d-11cf-810c-00aa00389b71}", VT_DISPATCH := 9, F_OWNVALUE := 1
try Return ComObject(VT_DISPATCH, ComObjQuery(Acc, IAccessible), F_OWNVALUE)
}
Re: Reading a currently open webpage and getting data
If You open new DOM, then You have to create new hidden element in that DOM.
May be it is better to use iframe?
May be it is better to use iframe?