Compare the logotype(image) in PDF-files with an original image

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Compare the logotype(image) in PDF-files with an original image

28 Sep 2020, 04:39

1) My idea is to identify PDF files using the logo in a PDF file. (is that a bad idea? why?)

2) Is there a good / easy way to do this?

3) Can a image comparison take a long time?
the logo I am looking at now has the size 510 x 227 pixels and the logo library has about 50 logos - created in the same way (the images should be identical - I think)

To select the logo in the PDF file, which I want to compare with the logo library.
I intend to use the program pdftoimage.exe something like pdftoimage inputfile.pdf pdflogo.png and then compare with the logos stored in the image library.
User avatar
mikeyww
Posts: 26442
Joined: 09 Sep 2014, 18:38

Re: Compare the logotype(image) in PDF-files with an original image

28 Sep 2020, 07:12

If your PDF file has nothing but the logo, then you could just examine the MD5 hash value of the file, and map that back to your image library.

Code: Select all

file = [full path to PDF file]
MsgBox, 64, MD5, % md5(file)
ExitApp

md5(file) {
 If !FileExist(file)
  Return
 Static out := TEMP "\md5.tmp"
 RunWait, %ComSpec% /c "%WINDIR%\System32\certutil.exe -hashfile %file% MD5 >%out%",, Hide
 FileReadLine, line, %out%, 2
 FileRecycle, %out%
 Return line
}
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Compare the logotype(image) in PDF-files with an original image

28 Sep 2020, 19:42

Something I missed?

No, the PDF file also has text.
But, the program pdftoimage.exe is only save the image from the PDF-file.
(Right now I don't know what happens if the PDF-file has many images)

As I said before - pdftoimage inputfile.pdf pdflogo.png (from CMD-prompt) is create an image (with the logo)
Then I want to compare pdflogo.png with Logo1.png, Logo2.png, Logo3.png ... etc.
If that two Images is equal - Then I know which logo is in the PDF file. (and which company the logo belongs.)
User avatar
mikeyww
Posts: 26442
Joined: 09 Sep 2014, 18:38

Re: Compare the logotype(image) in PDF-files with an original image

28 Sep 2020, 19:57

Yes. I think you missed thinking a little bit further about potential uses of the MD5 to meet your needs. Identical image files will also have the same MD5, right? Their MD5 will differ from the other files' MD5, right? Think of the MD5 as a sort of shorthand signature code that "summarizes" or represents a file's contents.
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Compare the logotype(image) in PDF-files with an original image

29 Sep 2020, 03:19

Ok!
I got an error "CertUtil: Too many arguments" :o
User avatar
mikeyww
Posts: 26442
Joined: 09 Sep 2014, 18:38

Re: Compare the logotype(image) in PDF-files with an original image

29 Sep 2020, 07:34

If your file path contains spaces, then you probably need to flank it in quotation marks.
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Compare the logotype(image) in PDF-files with an original image

30 Sep 2020, 05:24

Great! Thank's!
The problem was solved with this change

Code: Select all

RunWait, %ComSpec% /c "%WINDIR%\System32\certutil.exe -hashfile "%file%" MD5 > %out%",, Hide
The solution seems to work perfectly (so far)
Test program to use md5(file)
I will test if "comparing" 50 logos takes much longer than retrieving the same values ​​from a file.
(The downside is that that file may need to be updated in case of changes)
User avatar
mikeyww
Posts: 26442
Joined: 09 Sep 2014, 18:38

Re: Compare the logotype(image) in PDF-files with an original image

30 Sep 2020, 06:27

Good to hear. If the file contents are identical, they should have the same MD5. Retrieving the MD5 would typically be much faster than comparing images pixel by pixel. Since you are just looking at images, I think that you first want to extract the images as you were doing, and then compare the extracted image file's MD5 against your library file image's MD5.
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Compare the logotype(image) in PDF-files with an original image

05 Oct 2020, 08:52

The structure of this instruction, became even more "stable" and works in more situations .:

Code: Select all

RunWait % ComSpec " /c " A_WinDir "\System32\certutil.exe -hashfile """ file """ MD5 > """ Out """",, Hide
( Wonder if @mikeyww would prefer Format() in this case ;) )
Instead of using a temporary resultfile (out), how difficult it would be to use "Clipboard". (But that may be another question)
teadrinker
Posts: 4295
Joined: 29 Mar 2015, 09:41
Contact:

Re: Compare the logotype(image) in PDF-files with an original image

05 Oct 2020, 12:43

@mikeyww
You don't need using external apps to get MD5 hash, there is AHK code. Also you don't need to calculate hash to compare files, you can do it directly:

Code: Select all

filePath1 := "D:\File1.pdf"
filePath2 := "D:\File2.pdf"

MsgBox, % CompareFiles(filePath1, filePath2)

CompareFiles(filePath1, filePath2) {
   Loop 2
      File%A_Index% := FileOpen(filePath%A_Index%, "r")
   len := File1.Length
   cmp := (len != "" && len = File2.Length)
   Loop 2 {
      if cmp {
         File%A_Index%.Pos := 0
         File%A_Index%.RawRead(buff%A_Index%, len)
      }
      File%A_Index%.Close()
   }
   Return cmp && DllCall("msvcrt\memcmp", "Ptr", &buff1, "Ptr", &buff2, "Ptr", len, "Cdecl") = 0
}
Last edited by teadrinker on 13 Nov 2020, 18:09, edited 1 time in total.
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Compare the logotype(image) in PDF-files with an original image

13 Nov 2020, 17:42

I have not tested the idea from @teadrinker

I've been running MD5 for a while, and sometimes I get ErrorLevel 1 (do not know why.)
An excerpt from my code
a) First, all MD5 values ​​are read from all images in a directory (now only 6 images, but will be about 50)
all MD5- values ​​are stored in an array, together with other information about the images.

b) Then I get the MD5-value from the Image I want to compare.

c) Finally, the last MD5 value is compared with all other values, and when it is a hit,
a result is created from other information about the image.

But sometimes an error occurs. If I run again - no error is created.

Can teadrinkers' suggestions be more .:
  • Stable?
  • Faster?
  • Better?
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Compare the logotype(image) in PDF-files with an original image

14 Nov 2020, 15:45

Now I have tested the suggestion from @teadrinker.
It works if the two images have the same name, not with different names.
It doesn't work for me! (I don't know the name before the compare.)

My wish was to automatically find out who the invoice came from by analyzing the logo.
MD5 works, but I do not like runtime errors, especially when the process is to be automated.

I have searched MD5 for AHK on the Internet and the most complete is from @SKAN FileCRC32, FileSHA1, FileMD5() and MD5()
also @Laszlo has a similar solution md5.ahk
All suggestions work - on exe, ahk, txt, png files, but not on my jpg files. I have no idea why.

Searched further and found an intresting solution by @Rseding91 MD5 function for comparing images
When I started testing that function() I realized that AHK's FileRead can not handle jpg-files - Is that correct?
I'm unsure what to specify to the last function() Calc_MD5(_VarAddress, _VarSize)

Code: Select all

FileRead File,%A_ScriptFullPath%
FileGetSize FileSize,%A_ScriptFullPath%
Calc_MD5(&File, FileSize)
Does & mean something in the function call?
If a jpg-filename is specified in the call - will it be the hash number of the filename or file?
Is it possible to read a jpg file as binary data to a variable? (possibly this is one of my problems)

Share a test file containing all of the above functions()
AHK-testscript hash-number
teadrinker
Posts: 4295
Joined: 29 Mar 2015, 09:41
Contact:

Re: Compare the logotype(image) in PDF-files with an original image

14 Nov 2020, 15:56

Albireo wrote: It works if the two images have the same name, not with different names.
It doesn't work for me!
Looks like you do something wrong. My code does not depend on file names.
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Compare the logotype(image) in PDF-files with an original image

14 Nov 2020, 16:40

teadrinker wrote:
14 Nov 2020, 15:56
... Looks like you do something wrong. My code does not depend on file names.
My code - in the directory LogoLib it is many jpg-files (one jpg-file is the same as filePath1
xpdfLogo.jpg is a renamed copy of Ilab.jpg

Code: Select all

; Version 14 nov 2020
#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.

#Singleinstance force

; Test 1 (Works!)
; filePath1 := "c:\ProgExpo\temp\ilab.jpg"
; filePath1 := "..\LogoLib\ilab.jpg"

; Test 2 (same file new name - Doesn't work)
filePath1 := "c:\ProgExpo\temp\xpdfLogo.jpg"

filePath2 := "..\LogoLib\ilab.jpg"
MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % CompareFiles(filePath1, filePath2)


Loop Files, ..\LogoLib\*.*
{	; MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % filePath1 "`n`n" A_LoopFileLongPath
	If CompareFiles(filePath1, A_LoopFileLongPath)
	{	MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % filePath1 "`n`nFound! `n`n" A_LoopFileLongPath
		filePath2 := A_LoopFileLongPath
		Break
	}	
	else
		MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % filePath1 "`n`nNOT found! `n`n" A_LoopFileLongPath
}

MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % "The result .: "  CompareFiles(filePath1, filePath2)

ExitApp

CompareFiles(filePath1, filePath2)
{	Loop 1
	{	Loop 2
			File%A_Index% := FileOpen(filePath%A_Index%, "r")
		len := File1.Length
      if (len = "" || len != File2.Length)
			Break
		Loop 2
		{	File%A_Index%.Pos := 0
			File%A_Index%.RawRead(buff%A_Index%, len)
      }
      cmp := true
   }
	Loop 2
		File%A_Index%.Close()
   Return cmp && DllCall("msvcrt\memcmp", "Ptr", &buff1, "Ptr", &buff2, "Ptr", len, "Cdecl") = 0
}
teadrinker
Posts: 4295
Joined: 29 Mar 2015, 09:41
Contact:

Re: Compare the logotype(image) in PDF-files with an original image

14 Nov 2020, 17:05

Albireo wrote:

Code: Select all

CompareFiles(filePath1, filePath2)
{	Loop 1
	{	Loop 2
			File%A_Index% := FileOpen(filePath%A_Index%, "r")
		len := File1.Length
      if (len = "" || len != File2.Length)
			Break
		Loop 2
		{	File%A_Index%.Pos := 0
			File%A_Index%.RawRead(buff%A_Index%, len)
      }
      cmp := true
   }
	Loop 2
		File%A_Index%.Close()
   Return cmp && DllCall("msvcrt\memcmp", "Ptr", &buff1, "Ptr", &buff2, "Ptr", len, "Cdecl") = 0
}
It is not my code, should be:

Code: Select all

CompareFiles(filePath1, filePath2) {
   Loop 2
      File%A_Index% := FileOpen(filePath%A_Index%, "r")
   len := File1.Length
   cmp := (len != "" && len = File2.Length)
   Loop 2 {
      if cmp {
         File%A_Index%.Pos := 0
         File%A_Index%.RawRead(buff%A_Index%, len)
      }
      File%A_Index%.Close()
   }
   Return cmp && DllCall("msvcrt\memcmp", "Ptr", &buff1, "Ptr", &buff2, "Ptr", len, "Cdecl") = 0
}
Albireo wrote:

Code: Select all

filePath2 := "..\LogoLib\ilab.jpg"
Perhaps it's a wrong path, try specifying the full path.
User avatar
Xtra
Posts: 2744
Joined: 02 Oct 2015, 12:15

Re: Compare the logotype(image) in PDF-files with an original image

20 Sep 2021, 13:08

@teadrinker its working fine and useful :thumbup:

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Fonsii, Hile82, JnLlnd, peter_ahk, sanmaodo, srt and 114 guests