Could not get the webpage source into a variable Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
pleveris
Posts: 3
Joined: 02 Feb 2018, 17:29

Could not get the webpage source into a variable

19 Feb 2018, 08:25

Hello... I know this kind of question was widely mentioned in this forum, but I'm actually unable to find the best workaround for my needs... I'd like to get a webpage source into a variable; the following are my tries (after reviewing a lot of examples found here)

Code: Select all

[;This code works:
urlDownloadToFile, http://demorg.net/, index.txt

;This is not working
source:=urlDownloadToVar("http://demorg.net/")

urlDownloadToVar(url, byref src = "")
{
obj:=ComObjCreate("WinHttp.WinHttpRequest.5.1")
obj.Open("GET",url)
;obj.SetRequestHeader("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)")
	obj.Send()
return src:=obj.ResponseText
}
; Function executes, bbut instead of source code of given webpage it returns the following:
;;;
"""
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.<br />
</p>
</body></html>
"""
;;;

; And one more example (that also doesn't work)
info:=sourceDownloadToVar("http://demorg.net/")

sourceDownloadToVar(url, byref src = "")
{
ieobj := ComObjCreate("InternetExplorer.Application")
ieobj.Visible := False
ieobj.Navigate(url)
html := wb.Document.All[0].outerhtml
; Also tried html := wb.Document.body.innerhtml (no luck)
; Also tried html:=Wb.document.documentElement.outerhtml (the same result)
return src:=html
}
; This code outputs the following error:
;;;
"""
Error:  0x80004005 - Unspecified error
Source: (null)
Description: (null)
HelpFile: (null)
HelpContext: 0
Specifically: document
[...]
"""
;;;]
User avatar
Gio
Posts: 946
Joined: 30 Sep 2013, 10:54
Location: Brazil

Re: Could not get the webpage source into a variable

19 Feb 2018, 08:58

Hello Pleveris.

Welcome to the AutoHotkey community forums.

The following example uses HttpQuery() By DerRaphael. It is as easy to use as passing the URL as the first parameter and than setting the return value of the function to a variable of your choice (it will contain the HTML).

Code: Select all

; exmpl.searchAHKforum.httpQuery.ahk
; Searches the forum for a given Phrase: in this case httpQuery
#noenv
VarURL      := "http://www.demorg.net"
VarHtml := httpQuery(URL := VarURL)
msgbox % VarHtml
Return


;################################################;
; HttpQuery() By DerRaphael
;################################################;

; httpQuery-0-3-6.ahk
httpQuery(byref p1 = "", p2 = "", p3="", p4="")
{   ; v0.3.6 (w) Oct, 26 2010 by derRaphael / zLib-Style release
   ; currently the verbs showHeader, storeHeader, and updateSize are supported in httpQueryOps
   ; in case u need a different UserAgent, Proxy, ProxyByPass, Referrer, and AcceptType just
   ; specify them as global variables - mind the varname for referrer is httpQueryReferer [sic].
   ; Also if any special dwFlags are needed such as INTERNET_FLAG_NO_AUTO_REDIRECT or cache
   ; handling this might be set using the httpQueryDwFlags variable as global
   global httpQueryOps, httpAgent, httpProxy, httpProxyByPass, httpQueryReferer, httpQueryAcceptType
       , httpQueryDwFlags
   ; Get any missing default Values
   
   ;v0.3.6
   ; check for syntax
   if ( VarSetCapacity(p1) != 0 )
      dReturn:=true,  result := "", lpszUrl := p1, POSTDATA := p2, HEADERS  := p3
   else
      result := p1, lpszUrl := p2, POSTDATA := p3, HEADERS  := p4
   
   defaultOps =
   (LTrim Join|
      httpAgent=AutoHotkeyScript|httpProxy=0|httpProxyByPass=0|INTERNET_FLAG_SECURE=0x00800000
      SECURITY_FLAG_IGNORE_UNKNOWN_CA=0x00000100|SECURITY_FLAG_IGNORE_CERT_CN_INVALID=0x00001000
      SECURITY_FLAG_IGNORE_CERT_DATE_INVALID=0x00002000|SECURITY_FLAG_IGNORE_CERT_WRONG_USAGE=0x00000200
      INTERNET_OPEN_TYPE_PROXY=3|INTERNET_OPEN_TYPE_DIRECT=1|INTERNET_SERVICE_HTTP=3
   )
   Loop,Parse,defaultOps,|
   {
      RegExMatch(A_LoopField,"(?P<Option>[^=]+)=(?P<Default>.*)",http)
      if StrLen(%httpOption%)=0
         %httpOption% := httpDefault
   }

   ; Load Library
   hModule := DllCall("LoadLibrary", "Str", "WinINet.Dll")

   ; SetUpStructures for URL_COMPONENTS / needed for InternetCrackURL
   ; http://msdn.microsoft.com/en-us/library/aa385420(VS.85).aspx
   offset_name_length:= "4-lpszScheme-255|16-lpszHostName-1024|28-lpszUserName-1024|"
                  . "36-lpszPassword-1024|44-lpszUrlPath-1024|52-lpszExtrainfo-1024"
   VarSetCapacity(URL_COMPONENTS,60,0)
   ; Struc Size               ; Scheme Size                  ; Max Port Number
   NumPut(60,URL_COMPONENTS,0), NumPut(255,URL_COMPONENTS,12), NumPut(0xffff,URL_COMPONENTS,24)
   
   Loop,Parse,offset_name_length,|
   {
      RegExMatch(A_LoopField,"(?P<Offset>\d+)-(?P<Name>[a-zA-Z]+)-(?P<Size>\d+)",iCU_)
      VarSetCapacity(%iCU_Name%,iCU_Size,0)
      NumPut(&%iCU_Name%,URL_COMPONENTS,iCU_Offset)
      NumPut(iCU_Size,URL_COMPONENTS,iCU_Offset+4)
   }

   ; Split the given URL; extract scheme, user, pass, authotity (host), port, path, and query (extrainfo)
   ; http://msdn.microsoft.com/en-us/library/aa384376(VS.85).aspx
   DllCall("WinINet\InternetCrackUrlA","Str",lpszUrl,"uInt",StrLen(lpszUrl),"uInt",0,"uInt",&URL_COMPONENTS)

   ; Update variables to retrieve results
   Loop,Parse,offset_name_length,|
   {
      RegExMatch(A_LoopField,"-(?P<Name>[a-zA-Z]+)-",iCU_)
      VarSetCapacity(%iCU_Name%,-1)
   }
   nPort:=NumGet(URL_COMPONENTS,24,"uInt")
   
   ; Import any set dwFlags
   dwFlags := httpQueryDwFlags
   ; For some reasons using a selfsigned https certificates doesnt work
   ; such as an own webmin service - even though every security is turned off
   ; https with valid certificates works when
   if (lpszScheme = "https")
      dwFlags |= (INTERNET_FLAG_SECURE|SECURITY_FLAG_IGNORE_CERT_CN_INVALID
               |SECURITY_FLAG_IGNORE_CERT_WRONG_USAGE)

   ; Check for Header and drop exception if unknown or invalid URL
   if (lpszScheme="unknown") {
      Result := "ERR: No Valid URL supplied."
      Return StrLen(Result)
   }

   ; Initialise httpQuery's use of the WinINet functions.
   ; http://msdn.microsoft.com/en-us/library/aa385096(VS.85).aspx
   hInternet := DllCall("WinINet\InternetOpenA"
                  ,"Str",httpAgent,"UInt"
                  ,(httpProxy != 0 ?  INTERNET_OPEN_TYPE_PROXY : INTERNET_OPEN_TYPE_DIRECT )
                  ,"Str",httpProxy,"Str",httpProxyBypass,"Uint",0)

   ; Open HTTP session for the given URL
   ; http://msdn.microsoft.com/en-us/library/aa384363(VS.85).aspx
   hConnect := DllCall("WinINet\InternetConnectA"
                  ,"uInt",hInternet,"Str",lpszHostname, "Int",nPort
                  ,"Str",lpszUserName, "Str",lpszPassword,"uInt",INTERNET_SERVICE_HTTP
                  ,"uInt",0,"uInt*",0)

   ; Do we POST? If so, check for header handling and set default
   if (Strlen(POSTDATA)>0) {
      HTTPVerb:="POST"
      if StrLen(Headers)=0
         Headers:="Content-Type: application/x-www-form-urlencoded"
   } else ; otherwise mode must be GET - no header defaults needed
      HTTPVerb:="GET"   

   ; Form the request with proper HTTP protocol version and create the request handle
   ; http://msdn.microsoft.com/en-us/library/aa384233(VS.85).aspx
   hRequest := DllCall("WinINet\HttpOpenRequestA"
                  ,"uInt",hConnect,"Str",HTTPVerb,"Str",lpszUrlPath . lpszExtrainfo
                  ,"Str",ProVer := "HTTP/1.1", "Str",httpQueryReferer,"Str",httpQueryAcceptTypes
                  ,"uInt",dwFlags,"uInt",Context:=0 )

   ; Send the specified request to the server
   ; http://msdn.microsoft.com/en-us/library/aa384247(VS.85).aspx
   sRequest := DllCall("WinINet\HttpSendRequestA"
                  , "uInt",hRequest,"Str",Headers, "uInt",Strlen(Headers)
                  , "Str",POSTData,"uInt",Strlen(POSTData))

   VarSetCapacity(header, 2048, 0)  ; max 2K header data for httpResponseHeader
   VarSetCapacity(header_len, 4, 0)
   
   ; Check for returned server response-header (works only _after_ request been sent)
   ; http://msdn.microsoft.com/en-us/library/aa384238.aspx
   Loop, 5
     if ((headerRequest:=DllCall("WinINet\HttpQueryInfoA","uint",hRequest
      ,"uint",21,"uint",&header,"uint",&header_len,"uint",0))=1)
      break

   If (headerRequest=1) {
      VarSetCapacity(res,headerLength:=NumGet(header_len),32)
      DllCall("RtlMoveMemory","uInt",&res,"uInt",&header,"uInt",headerLength)
      Loop,% headerLength
         if (*(&res-1+a_index)=0) ; Change binary zero to linefeed
            NumPut(Asc("`n"),res,a_index-1,"uChar")
      VarSetCapacity(res,-1)
   } else
      res := "timeout"

   ; Get 1st Line of Full Response
   Loop,Parse,res,`n,`r
   {
      RetValue := A_LoopField
      break
   }
   
   ; No Connection established - drop exception
   If (RetValue="timeout") {
      html := "Error: timeout"
      return -1
   }
   ; Strip protocol version from return value
   RetValue := RegExReplace(RetValue,"HTTP/1\.[01]\s+")
   
    ; List taken from http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
   HttpRetCodes := "100=Continue|101=Switching Protocols|102=Processing (WebDAV) (RFC 2518)|"
              . "200=OK|201=Created|202=Accepted|203=Non-Authoritative Information|204=No"
              . " Content|205=Reset Content|206=Partial Content|207=Multi-Status (WebDAV)"
              . "|300=Multiple Choices|301=Moved Permanently|302=Found|303=See Other|304="
              . "Not Modified|305=Use Proxy|306=Switch Proxy|307=Temporary Redirect|400=B"
              . "ad Request|401=Unauthorized|402=Payment Required|403=Forbidden|404=Not F"
              . "ound|405=Method Not Allowed|406=Not Acceptable|407=Proxy Authentication "
              . "Required|408=Request Timeout|409=Conflict|410=Gone|411=Length Required|4"
              . "12=Precondition Failed|413=Request Entity Too Large|414=Request-URI Too "
              . "Long|415=Unsupported Media Type|416=Requested Range Not Satisfiable|417="
              . "Expectation Failed|418=I'm a teapot (RFC 2324)|422=Unprocessable Entity "
              . "(WebDAV) (RFC 4918)|423=Locked (WebDAV) (RFC 4918)|424=Failed Dependency"
              . " (WebDAV) (RFC 4918)|425=Unordered Collection (RFC 3648)|426=Upgrade Req"
              . "uired (RFC 2817)|449=Retry With|500=Internal Server Error|501=Not Implem"
              . "ented|502=Bad Gateway|503=Service Unavailable|504=Gateway Timeout|505=HT"
              . "TP Version Not Supported|506=Variant Also Negotiates (RFC 2295)|507=Insu"
              . "fficient Storage (WebDAV) (RFC 4918)|509=Bandwidth Limit Exceeded|510=No"
              . "t Extended (RFC 2774)"
   
   ; Gather numeric response value
   RetValue := SubStr(RetValue,1,3)
   
   ; Parse through return codes and set according informations
   Loop,Parse,HttpRetCodes,|
   {
      HttpReturnCode := SubStr(A_LoopField,1,3)    ; Numeric return value see above
      HttpReturnMsg  := SubStr(A_LoopField,5)      ; link for additional information
      if (RetValue=HttpReturnCode) {
         RetMsg := HttpReturnMsg
         break
      }
   }

   ; Global HttpQueryOps handling
   if strlen(HTTPQueryOps)>0 {
      ; Show full Header response (usefull for debugging)
      if (instr(HTTPQueryOps,"showHeader"))
         MsgBox % res
      ; Save the full Header response in a global Variable
      if (instr(HTTPQueryOps,"storeHeader"))
         global HttpQueryHeader := res
      ; Check for size updates to export to a global Var
      if (instr(HTTPQueryOps,"updateSize")) {
         Loop,Parse,res,`n
            If RegExMatch(A_LoopField,"Content-Length:\s+?(?P<Size>\d+)",full) {
               global HttpQueryFullSize := fullSize
               break
            }
         if (fullSize+0=0)
            HttpQueryFullSize := "size unavailable"
      }
   }

   ; Check for valid codes and drop exception if suspicious
   if !(InStr("100 200 201 202 302",RetValue)) {
      Result := RetValue " " RetMsg
      return StrLen(Result)
   }

   VarSetCapacity(BytesRead,4,0)
   fsize := 0
   Loop            ; the receiver loop - rewritten in the need to enable
   {               ; support for larger file downloads
      bc := A_Index
      VarSetCapacity(buffer%bc%,1024,0) ; setup new chunk for this receive round
      ReadFile := DllCall("wininet\InternetReadFile"
                  ,"uInt",hRequest,"uInt",&buffer%bc%,"uInt",1024,"uInt",&BytesRead)
      ReadBytes := NumGet(BytesRead)    ; how many bytes were received?
      If ((ReadFile!=0)&&(!ReadBytes))  ; we have had no error yet and received no more bytes
         break                         ; we must be done! so lets break the receiver loop
      Else {
         fsize += ReadBytes            ; sum up all chunk sizes for correct return size
         sizeArray .= ReadBytes "|"
      }
      if (instr(HTTPQueryOps,"updateSize"))
         Global HttpQueryCurrentSize := fsize
   }
   sizeArray := SubStr(sizeArray,1,-1)   ; trim last PipeChar
   
   VarSetCapacity( ( dReturn == true ) ? result : p1 ,fSize+1,0)      ; reconstruct the result from above generated chunkblocks
   Dest := ( dReturn == true ) ? &result : &p1                 ; to a our ByRef result variable
   Loop,Parse,SizeArray,|
      DllCall("RtlMoveMemory","uInt",Dest,"uInt",&buffer%A_Index%,"uInt",A_LoopField)
      , Dest += A_LoopField
     
   DllCall("WinINet\InternetCloseHandle", "uInt", hRequest)   ; close all opened
   DllCall("WinINet\InternetCloseHandle", "uInt", hInternet)
   DllCall("WinINet\InternetCloseHandle", "uInt", hConnect)
   DllCall("FreeLibrary", "UInt", hModule)                    ; unload the library
   
   if ( dReturn == true ) {
      VarSetCapacity( result, -1 )
      ErrorLevel := fSize
      return Result
   } else
      return fSize                      ; return the size - strings need update via VarSetCapacity(res,-1)
}

return
You can also use the second example at the docs for URLDownLoadToFile if you want to trade off bulkiness for absolute simplicity at retrieving the html.

Code: Select all

whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
whr.Open("GET", "http://www.demorg.net", true)
whr.Send()
; Using 'true' above and the call below allows the script to remain responsive.
whr.WaitForResponse()
version := whr.ResponseText
MsgBox % version
pleveris
Posts: 3
Joined: 02 Feb 2018, 17:29

Re: Could not get the webpage source into a variable

19 Feb 2018, 10:38

Hello Gio, this is somehow incorrect. Talking about your first example, that code returns the following "-1". As I understood from that function, this means something like HTML timeout or so. Talking about the 2nd, as I said in the main message, it outputs: "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.<br />
</p>
</body></html>". Maybe I should set something on that server to get this WinHTTP Obj Property work correctly...
User avatar
Gio
Posts: 946
Joined: 30 Sep 2013, 10:54
Location: Brazil

Re: Could not get the webpage source into a variable

19 Feb 2018, 10:55

On a second test, it seems that the HttpQuery() code i posted only works in the ANSI version (which was my current setting when i ran it). This is to be expected i suppose, since the code is from 2008, but i cannot find a work-around for using this function in unicode versions ATM.

Regarding the second example, i got that forbidden message once, but after running it again, the html was captured correctly. It even worked on Unicode versions here, but It's not that bulky a code, so i don't think it is that reliable. If you fail to get responses after multiple attempts, it may also have something to do with your firewall settings.
pleveris
Posts: 3
Joined: 02 Feb 2018, 17:29

Re: Could not get the webpage source into a variable

19 Feb 2018, 11:45

Hmm, tried with firewall-disabled and antivirus-disabled, the same result anyway. Well, if I try to change the website from demorg.net to, for example, google.com, then it works successfully...
User avatar
noname
Posts: 509
Joined: 19 Nov 2013, 09:15

Re: Could not get the webpage source into a variable  Topic is solved

19 Feb 2018, 12:59

You can try this it downloads the same text as urldownloadtofile ,code posted by jNizM see https://autohotkey.com/boards/viewtopic.php?t=3291

Code: Select all



;urlDownloadToFile, http://demorg.net/, index.txt

;see jNizM https://autohotkey.com/boards/viewtopic.php?t=3291

msgbox % downloadtostring( "http://demorg.net")


DownloadToString(url, encoding = "utf-8")
{
    static a := "AutoHotkey/" A_AhkVersion
    if (!DllCall("LoadLibrary", "str", "wininet") || !(h := DllCall("wininet\InternetOpen", "str", a, "uint", 1, "ptr", 0, "ptr", 0, "uint", 0, "ptr")))
        return 0
    c := s := 0, o := ""
    if (f := DllCall("wininet\InternetOpenUrl", "ptr", h, "str", url, "ptr", 0, "uint", 0, "uint", 0x80003000, "ptr", 0, "ptr"))
    {
        while (DllCall("wininet\InternetQueryDataAvailable", "ptr", f, "uint*", s, "uint", 0, "ptr", 0) && s > 0)
        {
            VarSetCapacity(b, s, 0)
            DllCall("wininet\InternetReadFile", "ptr", f, "ptr", &b, "uint", s, "uint*", r)
            o .= StrGet(&b, r >> (encoding = "utf-16" || encoding = "cp1200"), encoding)
        }
        DllCall("wininet\InternetCloseHandle", "ptr", f)
    }
    DllCall("wininet\InternetCloseHandle", "ptr", h)
    return o
}
soundcloud.com/user-32706894

Return to “Ask For Help”

Who is online

Users browsing this forum: Albireo, Bing [Bot], boiler, Eureka, JawGBoi, Weshuggah, Zeppy and 145 guests