WinHttpRequest5.1 COM - returns 404 and 301 status code when page is 404ing. Why?

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
Tigerlily
Posts: 377
Joined: 04 Oct 2018, 22:31

WinHttpRequest5.1 COM - returns 404 and 301 status code when page is 404ing. Why?

23 Feb 2019, 00:44

Does anyone know why this code returns a status code 404 AND 301 at other times? It seems to be at random, but when I check the page in my Chrome browser or other tools the page is definitely 404ing. Maybe I'm forgetting some sort of exception handler, but I don't understand why this would be inconsistent. All 200 OK status code webpages do not have this issue. Haven't received any other status codes besides 200, 301, and 404s. Here are the URLs I'm testing it out on:


https://www.samsung.com/us/mobile/galaxy-note9/phone-plus/
https://www.samsung.com/us/mobile/galaxy-note9/reviews/
https://www.samsung.com/us/mobile/galaxy-note9/compare-andriod/

Usually I don't have an issue with these if I run them in small batches, but if I run them in batches of 100 or more URLs, sometimes I get a 301 when I shouldn't be..

Code: Select all

try Xl := ComObjActive("Excel.Application") ; Try to connect to Active Excel instance 

TotalRows := (Xl.Selection.Rows.Count)	;	Count how many rows are selected in Excel

SAFEarray := ComObjArray(VT_VARIANT:=12, TotalRows, 1)	;	 Create SafeArray to insert valid https URLs

Loop, %TotalRows% 	;	Loop through JSON data in iterations to set each cell with specified data
	{				
		SAFEarray[A_Index-1,0] := (Xl.ActiveCell.Offset(A_Index-1,0).Value)
	}

Loop, %TotalRows%	;	Make a request for each URL in selection
{

try 
{ 
	
	URLaddress := (SAFEarray[A_Index-1,0])	;	Retrieve cell contents with a valid https URL from SAFEARRAY

	whr := ComObjCreate("WinHttp.WinHttpRequest.5.1") 

	;=======;		RETRIEVE WEB DOCUMENT		;=======;
	
	whr.SetTimeouts(0,30000,30000,120000)
		whr.Open("GET", URLaddress, true)
		whr.Send()
try 
{	
	whr.WaitForResponse()
	} 
	catch, HTTPrequestTimeout ; handle error if request times out (after 2 minutes)
	{
	
		if (InStr((HTTPrequestTimeout.Message), "The operation timed out") != 0)
			{
				MetaDataFinal := "ERR_CONNECTION_TIMED_OUT: This site can’t be reached - took too long to respond."
				GoTo, HTTPrequestTimedoutCanonical
			}
		else
			break
	}


if ((whr.Status) != "200")	;	if the Status Code of the URL doesn't return a 200
	{
		MetaDataFinal := (whr.Status) " - " (whr.StatusText)	; 	Send a message what kind of URL it is
		MsgBox % MetaDataFinal
	}
}
}
there is quite a bit more to this script, but I feel like this is where the issue lies. Eventually it parses the HTML to look for specified meta data, then returns the URLs with the findings into a .txt file.

Thanks for any and all insight!
-TL
safetycar
Posts: 435
Joined: 12 Aug 2017, 04:27

Re: WinHttpRequest5.1 COM - returns 404 and 301 status code when page is 404ing. Why?

23 Feb 2019, 01:49

Are you calling always the same host? I mean, maybe the server is protecting itself from your "bot". If that was the case, you can try to be more friendly putting some delay (in the order of seconds) between same host requests.

EDIT: By the way, the urls you provided are all 404 both in browser and on whr for me.
User avatar
Tigerlily
Posts: 377
Joined: 04 Oct 2018, 22:31

Re: WinHttpRequest5.1 COM - returns 404 and 301 status code when page is 404ing. Why?

23 Feb 2019, 03:08

safetycar wrote:
23 Feb 2019, 01:49
Are you calling always the same host? I mean, maybe the server is protecting itself from your "bot". If that was the case, you can try to be more friendly putting some delay (in the order of seconds) between same host requests.

EDIT: By the way, the urls you provided are all 404 both in browser and on whr for me.
I suppose that could be the case, but why would it send a 301 redirect instead of a 403 or something else? Also, I've also scraped this domain at 10+ URLs/sec with Screaming Frog in the past with no problems. I'm getting annoyed by Screaming Frogs inaccuracies, so building out my own SEO web scraping tool.

Also, it is not scraping that fast with my current method, usually between 1-2.5 URLs/sec.
-TL
safetycar
Posts: 435
Joined: 12 Aug 2017, 04:27

Re: WinHttpRequest5.1 COM - returns 404 and 301 status code when page is 404ing. Why?

23 Feb 2019, 03:53

Tigerlily wrote:
23 Feb 2019, 03:08
I suppose that could be the case, but why would it send a 301 redirect instead of a 403 or something else? Also, I've also scraped this domain at 10+ URLs/sec with Screaming Frog in the past with no problems.
I guess that leaves out my supposition, although technically they could throw misleading errors if they wanted.

I don't see issues with the code itself. And I don't think user-agent or referrer would fix anything given that it already works.

Creating a whr object every time, like you did, looks like a good way to avoid accumulating errors too...

I'm clueless.

Does a retry usually fix the issue?

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Google [Bot], MSN [Bot], NinjoOnline and 204 guests