htm/html files: identify the web browser that saved it

Post your working scripts, libraries and tools for AHK v1.1 and older
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

htm/html files: identify the web browser that saved it

24 Jun 2019, 06:06

- I had some htm/html files, and wanted to establish (if possible) which browser they'd come from.
- It seems reasonable that browsers would indicate the original url for the webpage, and which browser was used to save it, but this is not commonly the case.
- To identify possible distinguishing information, I saved a completely blank 0-byte text file as blank.htm, and tried opening/saving it with Internet Explorer/Firefox/Chrome. Here are the results:
- (Note: any LFs have been replaced with CRLFs:)
- (Note: if an htm/html file is saved directly, as a download, the web browser leaves no marks.)

Code: Select all

==================================================

[blank IE.htm][Internet Explorer 11]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><META content="IE=5.0000" http-equiv="X-UA-Compatible">

<META http-equiv="Content-Type" content="text/html; charset=windows-1252">
<META name="GENERATOR" content="MSHTML 11.00.9600.19377"></HEAD>
<BODY></BODY></HTML>

==================================================

[blank Firefox.htm][Firefox 67]

<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"></head><body></body></html>

==================================================

[blank Chrome.htm][Chrome 75]

<!-- saved from url=(0037)file:///C:/Users/me/Desktop/blank.htm -->
<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252"></head><body></body></html>

==================================================
- Internet Explorer uses upper-case tags e.g. <HTML>, whereas Firefox/Chrome use lower-case tags e.g. <html>.
- Internet Explorer/Firefox use CRLFs for line breaks, whereas Chrome uses LFs for line breaks.
- Internet Explorer/Firefox create htm files, whereas Chrome creates html files.

- Here is some code based on the tag case/line break findings:

Code: Select all

q:: ;htm/html files - get web browser that saved it

;vDir1 := "C:\Users\" A_UserName "\Documents"
vDir1 := "C:\Users\" A_UserName "\Downloads"
vOutput := ""
Loop Files, % vDir1 "\*", % "F"
{
	vPath := A_LoopFileFullPath
	SplitPath, vPath, vName, vDir, vExt, vNameNoExt, vDrive
	if !(vExt = "htm") && !(vExt = "html")
		continue
	vOutput .= FileGetWebBrowser(vPath) "`t" vPath "`r`n"
}
Clipboard := vOutput
MsgBox, % "done"
return

;==================================================

;note: IE/Firefox typically create .htm files
;note: Chrome typically creates .html files

;note: a rough function
FileGetWebBrowser(vPath)
{
	local
	FileRead, vText, % vPath
	if InStr(vText, "<HTML", 1) ;upper case in Internet Explorer (lower case in Firefox/Chrome)
		return "Internet Explorer"
	else if InStr(vText, ">`r`n<") ;Firefox uses CRLFs
		return "Firefox"
	else
		return "Chrome"
}
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA

Return to “Scripts and Functions (v1)”

Who is online

Users browsing this forum: No registered users and 65 guests