Reading a currently open webpage and getting data

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Autoblow9000
Posts: 3
Joined: 16 Mar 2020, 03:38

Reading a currently open webpage and getting data

Post by Autoblow9000 » 16 Mar 2020, 03:44

Hi Guys,

I am trying to use this script with a ticketing system at work, to make my job slightly easier in filling out email templates.
I basically want to press a hotkey (f5 for example) and read the google chrome webpage I currently have open and active(My current ticket), I then want to extract specific data from the webpage, like a reference number or email address.

I can do this fairly easily just by parsing the HTML code with python however I am having a hard time figuring out a way to grab the HTML info of the webpage I currently have open.

The url will be different each time I use the the script.

Can any one point me in the right direction either with autohotkey or python

Cheers,

teadrinker
Posts: 4412
Joined: 29 Mar 2015, 09:41
Contact:

Re: Reading a currently open webpage and getting data

Post by teadrinker » 16 Mar 2020, 05:19

Hi
Try this:

Code: Select all

SetBatchLines, -1

js =
(
var textArea = document.createElement("textarea");
textArea.value = new XMLSerializer().serializeToString(document);
textArea.style.position = 'fixed';
document.body.appendChild(textArea);
textArea.focus();
textArea.select();
document.execCommand('copy');
textArea.parentNode.removeChild(textArea)
)

$F1::
   Clipboard := ""
   RunJsFromChromeAddressBar(js)
   ClipWait, 3
   MsgBox, % Clipboard
   Return

RunJsFromChromeAddressBar(js, exe := "chrome.exe") {
   static WM_GETOBJECT := 0x3D
        , ROLE_SYSTEM_TEXT := 0x2A
        , STATE_SYSTEM_FOCUSABLE := 0x100000
        , SELFLAG_TAKEFOCUS := 0x1
   
   window := "ahk_class Chrome_WidgetWin_1 ahk_exe " . exe
   WinActivate, % window
   SendMessage, WM_GETOBJECT, 0, 1, Chrome_RenderWidgetHostHWND1, % window
   AccChrome := AccObjectFromWindow( WinExist(window) )
   AccAddrBar := SearchElement(AccChrome, {Role: ROLE_SYSTEM_TEXT, State: STATE_SYSTEM_FOCUSABLE})
   AccAddrBar.accValue(0) := "javascript:" . js
   AccAddrBar.accSelect(SELFLAG_TAKEFOCUS, 0)
   ControlSend,, {Enter}, % window
}

SearchElement(parentElement, params)
{
   found := true
   for k, v in params {
      try {
         if (k = "ChildCount")
            (parentElement.accChildCount != v && found := false)
         else if (k = "State")
            (!(parentElement.accState(0) & v) && found := false)
         else
            (parentElement["acc" . k](0) != v && found := false)
      }
      catch 
         found := false
   } until !found
   if found
      Return parentElement
   
   for k, v in AccChildren(parentElement)
      if obj := SearchElement(v, params)
         Return obj
}

AccObjectFromWindow(hWnd, idObject = 0) {
   static IID_IDispatch   := "{00020400-0000-0000-C000-000000000046}"
        , IID_IAccessible := "{618736E0-3C3D-11CF-810C-00AA00389B71}"
        , OBJID_NATIVEOM  := 0xFFFFFFF0, VT_DISPATCH := 9, F_OWNVALUE := 1
        , h := DllCall("LoadLibrary", "Str", "oleacc", "Ptr")
        
   VarSetCapacity(IID, 16), idObject &= 0xFFFFFFFF
   DllCall("ole32\CLSIDFromString", "Str", idObject = OBJID_NATIVEOM ? IID_IDispatch : IID_IAccessible, "Ptr", &IID)
   if DllCall("oleacc\AccessibleObjectFromWindow", "Ptr", hWnd, "UInt", idObject, "Ptr", &IID, "PtrP", pAcc) = 0
      Return ComObject(VT_DISPATCH, pAcc, F_OWNVALUE)
}

AccChildren(Acc) {
   static VT_DISPATCH := 9
   Loop 1  {
      if ComObjType(Acc, "Name") != "IAccessible"  {
         error := "Invalid IAccessible Object"
         break
      }
      try cChildren := Acc.accChildCount
      catch
         Return ""
      Children := []
      VarSetCapacity(varChildren, cChildren*(8 + A_PtrSize*2), 0)
      res := DllCall("oleacc\AccessibleChildren", "Ptr", ComObjValue(Acc), "Int", 0
                                                , "Int", cChildren, "Ptr", &varChildren, "IntP", cChildren)
      if (res != 0) {
         error := "AccessibleChildren DllCall Failed"
         break
      }
      Loop % cChildren  {
         i := (A_Index - 1)*(A_PtrSize*2 + 8)
         child := NumGet(varChildren, i + 8)
         Children.Push( (b := NumGet(varChildren, i) = VT_DISPATCH) ? AccQuery(child) : child )
         ( b && ObjRelease(child) )
      }
   }
   if error
      ErrorLevel := error
   else
      Return Children.MaxIndex() ? Children : ""
}

AccQuery(Acc) {
   static IAccessible := "{618736e0-3c3d-11cf-810c-00aa00389b71}", VT_DISPATCH := 9, F_OWNVALUE := 1
   try Return ComObject(VT_DISPATCH, ComObjQuery(Acc, IAccessible), F_OWNVALUE)
}

Autoblow9000
Posts: 3
Joined: 16 Mar 2020, 03:38

Re: Reading a currently open webpage and getting data

Post by Autoblow9000 » 16 Mar 2020, 10:24

teadrinker wrote:
16 Mar 2020, 05:19
Hi
Try this:

}[/code]


Would you be able to tell me how this works and what parts of the script I should be paying attention to in order to make it work the way I need . Wasn't ready for this level of complexity to be honest.

teadrinker
Posts: 4412
Joined: 29 Mar 2015, 09:41
Contact:

Re: Reading a currently open webpage and getting data

Post by teadrinker » 16 Mar 2020, 11:50

To start, open your Chrome with the page you are interested in and run the script and press F1. The message box with html-code should appear, does it?

User avatar
rommmcek
Posts: 1480
Joined: 15 Aug 2014, 15:18

Re: Reading a currently open webpage and getting data

Post by rommmcek » 16 Mar 2020, 14:40

This is more basic/less complex approach:

Code: Select all

$F1::
Send, ^u
While (Clipboard && A_Index < 10) {
    Clipboard:= ""
    Sleep, 50
}
WinWaitActive view-source: ahk_exe chrome.exe
While (!Clipboard && A_Index < 10) {
    Sleep, 100
    Send, ^a
    Sleep, 100
    Send, ^c
}
Send, ^w
MsgBox % Clipboard
Return

teadrinker
Posts: 4412
Joined: 29 Mar 2015, 09:41
Contact:

Re: Reading a currently open webpage and getting data

Post by teadrinker » 16 Mar 2020, 16:00

I think the first cycle is unnecessary. :)

User avatar
rommmcek
Posts: 1480
Joined: 15 Aug 2014, 15:18

Re: Reading a currently open webpage and getting data

Post by rommmcek » 16 Mar 2020, 19:22

I think too, that ClipBoard:= "" as a AutoHotkey command is generally very reliable, however in extreme cases like high CPU-usage and especially heavy multiple usage of clipboard by other apps can lead to failure nonetheless. While loop reduces that possibility furthermore drastically.

poetbox
Posts: 112
Joined: 18 Apr 2018, 20:47

Re: Reading a currently open webpage and getting data

Post by poetbox » 16 Mar 2020, 20:03

rommmcek wrote:
16 Mar 2020, 19:22
I think too, that ClipBoard:= "" as a AutoHotkey command is generally very reliable, however in extreme cases like high CPU-usage and especially heavy multiple usage of clipboard by other apps can lead to failure nonetheless. While loop reduces that possibility furthermore drastically.
Your code is more reliable indeed after testing.While the former code sometimes reports error because of Google :crazy: .

malcev
Posts: 1769
Joined: 12 Aug 2014, 12:37

Re: Reading a currently open webpage and getting data

Post by malcev » 16 Mar 2020, 20:54

If You want speed You can select all and copy and then get html format from clipboard.

teadrinker
Posts: 4412
Joined: 29 Mar 2015, 09:41
Contact:

Re: Reading a currently open webpage and getting data

Post by teadrinker » 17 Mar 2020, 02:20

If the only goal is getting html-code, perhaps my approach is not best, but it allows to run any js-script on the page and to get its result. And it's not necessary using clipboard for getting a result.

malcev
Posts: 1769
Joined: 12 Aug 2014, 12:37

Re: Reading a currently open webpage and getting data

Post by malcev » 17 Mar 2020, 03:37

How You could do it without using clipboard?

garry
Posts: 3795
Joined: 22 Dec 2013, 12:50

Re: Reading a currently open webpage and getting data

Post by garry » 17 Mar 2020, 03:46

internet radio example with vlc.exe , read html in memory and extract specific data from the webpage
( here is the url fix , but can use 'onclipboardchange' if needed to copy and change the url )

Code: Select all

;CREATED  =20130104

#warn
#NoTrayIcon
setworkingdir,%a_scriptdir%
Filename1=Internet Radio  ( use ESC for MUTE=ON/OFF )

f1:="http://144.217.253.136:8582/played.html"    ;- html saved in memory
f2=http://144.217.253.136:8582/                  ;- run this with vlc.exe

new:=""
vlcx        =%A_programfiles%\VideoLAN\VLC\vlc.exe
ifexist,%vlcx%
 run,%vlcx% --one-instance --qt-start-minimized %f2%,,hide,pid1

;--- Hotkey Escape for Mute ON/OFF -----------
HK1=Esc
  Hotkey,%hk1%,mute1,ON

xx:=ComObjCreate("WinHttp.WinHttpRequest.5.1")   ;-Create the Object
ComObjError(false)
xx.Silent := True    ;- script failure = off
xx.SetTimeouts(500,500,500,500)

Gui,2: Color, 000000
Gui,2:Font,  S10 CDefault , Lucida Console
Gui,2:Add,Text, cYellow x10  y10 h23  w680 vT1,
Gui,2:Add,Text, cGray   x10  y50 h170 w680 vT2
Gui,2: Show, x10 y1  w700 ,%filename1%
settimer,aa1,5000
gosub,aa1
return
;-----------------------
aa1:
new:=""
try {
    xx.Open("GET",f1)                           ;-Open communication read html( in memory )
    xx.Send()                                   ;-Send the "get" request
    aac=
    aac:=xx.ResponseText                        ;-Set the "aac" variable to the response
    ;msgbox, 262208,%f1%--TEXT ,%aac%           ;- see TEXT from url
} catch e {
    xxx:=e.Message
    msgbox, 262208,ERROR ,Error=Catch`n%f1%`n NOT exists`n------------------------------------------`n%xxx%`n------------------------------------------,
    }
    StringReplace,y,aac,</tr>,$, All
    Loop,parse,y,$,
        {
        if A_loopfield contains Current Song
            {
            y:=RegExReplace( A_loopfield, "<.*?>" )
            stringreplace,y,y,Current Song,,all
            stringmid,y,y,9,200
            break
            }
        }
    y :=RegExReplace(y, "\W", " ")     ;- now playing
;-   all last played ---------------
StringReplace,aac,aac,</tr>,$, All
stringreplace,aac,aac,&nbsp;,,all
aac:=RegExReplace(aac, "<.*?>" )
Loop,parse,aac,$,
   new .= a_loopfield "`n"
;msgbox, 262208,PLAYED_SONGS,%new%
GuiControl,2:,T2,%new%
GuiControl,2:,T1 ,%y%      ;-- Current Text
;msgbox, 262208, ,%y%,2
return
;--------------------
MUTE1:
soundset,+1,master,mute
return
;--------------------
2Guiclose:
settimer,aa1,OFF
Process, Exist, vlc.exe
If ErrorLevel
   {
   msgbox, 262435,Radio-Close,Want you close also Audio ?
   ifmsgbox,NO
      {
      soundset,0,master,mute
      exitapp
      }
   ifmsgbox,Cancel
      {
      settimer,aa1,ON
      return
      }
   else
     {
     soundset,0,master,mute
     process,close,vlc.exe
     exitapp
     }
   }
else
  {
  soundset,0,master,mute
  exitapp
  }
return
;=================================================================

teadrinker
Posts: 4412
Joined: 29 Mar 2015, 09:41
Contact:

Re: Reading a currently open webpage and getting data

Post by teadrinker » 17 Mar 2020, 04:04

malcev wrote:
17 Mar 2020, 03:37
How You could do it without using clipboard?
It's possible to create a hidden text node, to put info there, and to find it via an accessible interface.
Using this approach you could automate a page without any extensions.

malcev
Posts: 1769
Joined: 12 Aug 2014, 12:37

Re: Reading a currently open webpage and getting data

Post by malcev » 17 Mar 2020, 04:11

But if the page has a lot of elements then it can take a lot of time.
I think that it can be different approach.

teadrinker
Posts: 4412
Joined: 29 Mar 2015, 09:41
Contact:

Re: Reading a currently open webpage and getting data

Post by teadrinker » 17 Mar 2020, 04:18

You could add that element at the beginning of the page.

poetbox
Posts: 112
Joined: 18 Apr 2018, 20:47

Re: Reading a currently open webpage and getting data

Post by poetbox » 17 Mar 2020, 04:41

teadrinker wrote:
17 Mar 2020, 02:20
If the only goal is getting html-code, perhaps my approach is not best, but it allows to run any js-script on the page and to get its result. And it's not necessary using clipboard for getting a result.
Obviously, your script is a masterpiece.But it is so complex to be understood or modified.
For example, I want to modify the JS-script to get the contents of a web page with an ID, knowing that I need to use getElementsById, but not knowing where to change the code.
Can you give me a hint on how to get the value of an ID based on the original code?
Any small changes will result in an error or NULL :headwall:

malcev
Posts: 1769
Joined: 12 Aug 2014, 12:37

Re: Reading a currently open webpage and getting data

Post by malcev » 17 Mar 2020, 04:54

teadrinker wrote:
17 Mar 2020, 04:18
You could add that element at the beginning of the page.
If it will be faster why do You use clipboard?

malcev
Posts: 1769
Joined: 12 Aug 2014, 12:37

Re: Reading a currently open webpage and getting data

Post by malcev » 17 Mar 2020, 05:06

One more option with communication: between chrome/firefox<->ahk with native messaging:
https://www.autohotkey.com/boards/viewtopic.php?t=32299

teadrinker
Posts: 4412
Joined: 29 Mar 2015, 09:41
Contact:

Re: Reading a currently open webpage and getting data

Post by teadrinker » 17 Mar 2020, 07:43

A simple example. Open https://time.is/Unix_time_now in Chrome, run the script:

Code: Select all

SetBatchLines, -1
global window := "ahk_class Chrome_WidgetWin_1 ahk_exe chrome.exe"  ; or msedge.exe
SendMessage, WM_GETOBJECT := 0x3D, 0, 1, Chrome_RenderWidgetHostHWND1, % window

js =
(
   (() => {
      let receiver, sender;
      if (!(receiver = document.getElementById('ahk_receiver'))) {
         receiver = createTextArea('ahk_receiver');
         sender = createTextArea('ahk_sender');
         sender.addEventListener('input', inputHandler);
      }
      
      function inputHandler(event) {
         receiver.value = document.getElementById('clock').innerText;
      }
      
      function createTextArea(name) {
         const textArea = document.createElement('textarea');
         textArea.setAttribute('aria-label', name);
         textArea.setAttribute('id', name);
         textArea.style.position = 'fixed';
         textArea.style.width = 0;
         textArea.style.height = 0;
         document.body.insertBefore(textArea, document.body.firstChild);
         return textArea
      }
   })();
)
SendJS(js)
Return

$F1:: MsgBox, % GetInfo()

GetInfo() {
   static receiver, sender
   if !receiver {
      receiver := GetAccElem("ahk_receiver")
      sender := GetAccElem("ahk_sender")
   }
   sender.accValue(0) := A_TickCount
   Sleep, 100
   Return receiver.accValue(0)
}

SendJS(js) {
   static AccAddrBar
   (!AccAddrBar && AccAddrBar := GetAccElem("addr"))
   AccAddrBar.accValue(0) := "javascript:" . js
   AccAddrBar.accSelect(SELFLAG_TAKEFOCUS := 0x1, 0)
   ControlSend,, {Enter}, % window
}

GetAccElem(elem) {
   static ROLE_SYSTEM_TEXT := 0x2A, STATE_SYSTEM_FOCUSABLE := 0x100000, accChrome
        , Elems := { addr:         {Role: ROLE_SYSTEM_TEXT, State: STATE_SYSTEM_FOCUSABLE}
                   , ahk_receiver: {Role: ROLE_SYSTEM_TEXT, Name: "ahk_receiver"}
                   , ahk_sender:   {Role: ROLE_SYSTEM_TEXT, Name: "ahk_sender"} }
                   
   if !accChrome
      accChrome := AccObjectFromWindow( WinExist(window) )
   Return SearchElement(accChrome, Elems[elem])
}

AccObjectFromWindow(hWnd, idObject = 0) {
   static IID_IDispatch   := "{00020400-0000-0000-C000-000000000046}"
        , IID_IAccessible := "{618736E0-3C3D-11CF-810C-00AA00389B71}"
        , OBJID_NATIVEOM  := 0xFFFFFFF0, VT_DISPATCH := 9, F_OWNVALUE := 1
        , h := DllCall("LoadLibrary", "Str", "oleacc", "Ptr")
        
   VarSetCapacity(IID, 16), idObject &= 0xFFFFFFFF
   DllCall("ole32\CLSIDFromString", "Str", idObject = OBJID_NATIVEOM ? IID_IDispatch : IID_IAccessible, "Ptr", &IID)
   if DllCall("oleacc\AccessibleObjectFromWindow", "Ptr", hWnd, "UInt", idObject, "Ptr", &IID, "PtrP", pAcc) = 0
      Return ComObject(VT_DISPATCH, pAcc, F_OWNVALUE)
}

SearchElement(parentElement, params)
{
   found := true
   for k, v in params {
      try {
         if (k = "State")
            (!(parentElement.accState(0)    & v) && found := false)
         else if (k ~= "^(Name|Value)$")
            (!(parentElement["acc" . k](0) ~= v) && found := false)
         else if (k = "ChildCount")
            (parentElement["acc" . k]      != v  && found := false)
         else
            (parentElement["acc" . k](0)   != v  && found := false)
      }
      catch 
         found := false
   } until !found
   if found
      Return parentElement
   
   for k, v in AccChildren(parentElement)
      if obj := SearchElement(v, params)
         Return obj
}

AccChildren(Acc) {
   static VT_DISPATCH := 9
   Loop 1  {
      if ComObjType(Acc, "Name") != "IAccessible"  {
         error := "Invalid IAccessible Object"
         break
      }
      try cChildren := Acc.accChildCount
      catch
         Return ""
      Children := []
      VarSetCapacity(varChildren, cChildren*(8 + A_PtrSize*2), 0)
      res := DllCall("oleacc\AccessibleChildren", "Ptr", ComObjValue(Acc), "Int", 0
                                                , "Int", cChildren, "Ptr", &varChildren, "IntP", cChildren)
      if (res != 0) {
         error := "AccessibleChildren DllCall Failed"
         break
      }
      Loop % cChildren  {
         i := (A_Index - 1)*(A_PtrSize*2 + 8)
         child := NumGet(varChildren, i + 8)
         Children.Push( (b := NumGet(varChildren, i) = VT_DISPATCH) ? AccQuery(child) : child )
         ( b && ObjRelease(child) )
      }
   }
   if error
      ErrorLevel := error
   else
      Return Children.MaxIndex() ? Children : ""
}

AccQuery(Acc) {
   static IAccessible := "{618736e0-3c3d-11cf-810c-00aa00389b71}", VT_DISPATCH := 9, F_OWNVALUE := 1
   try Return ComObject(VT_DISPATCH, ComObjQuery(Acc, IAccessible), F_OWNVALUE)
}
Every time you press F1 you will get unix time in the message box. First time it will take some time, but then it will happen without delay.

malcev
Posts: 1769
Joined: 12 Aug 2014, 12:37

Re: Reading a currently open webpage and getting data

Post by malcev » 17 Mar 2020, 08:50

If You open new DOM, then You have to create new hidden element in that DOM.
May be it is better to use iframe?

Post Reply

Return to “Ask for Help (v1)”