1) My idea is to identify PDF files using the logo in a PDF file. (is that a bad idea? why?)
2) Is there a good / easy way to do this?
3) Can a image comparison take a long time?
the logo I am looking at now has the size 510 x 227 pixels and the logo library has about 50 logos - created in the same way (the images should be identical - I think)
To select the logo in the PDF file, which I want to compare with the logo library.
I intend to use the program pdftoimage.exe something like pdftoimage inputfile.pdf pdflogo.png and then compare with the logos stored in the image library.
Compare the logotype(image) in PDF-files with an original image
Re: Compare the logotype(image) in PDF-files with an original image
If your PDF file has nothing but the logo, then you could just examine the MD5 hash value of the file, and map that back to your image library.
Code: Select all
file = [full path to PDF file]
MsgBox, 64, MD5, % md5(file)
ExitApp
md5(file) {
If !FileExist(file)
Return
Static out := TEMP "\md5.tmp"
RunWait, %ComSpec% /c "%WINDIR%\System32\certutil.exe -hashfile %file% MD5 >%out%",, Hide
FileReadLine, line, %out%, 2
FileRecycle, %out%
Return line
}
Re: Compare the logotype(image) in PDF-files with an original image
Something I missed?
No, the PDF file also has text.
But, the program pdftoimage.exe is only save the image from the PDF-file.
(Right now I don't know what happens if the PDF-file has many images)
As I said before - pdftoimage inputfile.pdf pdflogo.png (from CMD-prompt) is create an image (with the logo)
Then I want to compare pdflogo.png with Logo1.png, Logo2.png, Logo3.png ... etc.
If that two Images is equal - Then I know which logo is in the PDF file. (and which company the logo belongs.)
No, the PDF file also has text.
But, the program pdftoimage.exe is only save the image from the PDF-file.
(Right now I don't know what happens if the PDF-file has many images)
As I said before - pdftoimage inputfile.pdf pdflogo.png (from CMD-prompt) is create an image (with the logo)
Then I want to compare pdflogo.png with Logo1.png, Logo2.png, Logo3.png ... etc.
If that two Images is equal - Then I know which logo is in the PDF file. (and which company the logo belongs.)
Re: Compare the logotype(image) in PDF-files with an original image
Yes. I think you missed thinking a little bit further about potential uses of the MD5 to meet your needs. Identical image files will also have the same MD5, right? Their MD5 will differ from the other files' MD5, right? Think of the MD5 as a sort of shorthand signature code that "summarizes" or represents a file's contents.
Re: Compare the logotype(image) in PDF-files with an original image
Ok!
I got an error "CertUtil: Too many arguments"
I got an error "CertUtil: Too many arguments"
Re: Compare the logotype(image) in PDF-files with an original image
If your file path contains spaces, then you probably need to flank it in quotation marks.
Re: Compare the logotype(image) in PDF-files with an original image
Great! Thank's!
The problem was solved with this change
The solution seems to work perfectly (so far)
(The downside is that that file may need to be updated in case of changes)
The problem was solved with this change
Code: Select all
RunWait, %ComSpec% /c "%WINDIR%\System32\certutil.exe -hashfile "%file%" MD5 > %out%",, Hide
Test program to use md5(file)
I will test if "comparing" 50 logos takes much longer than retrieving the same values from a file.(The downside is that that file may need to be updated in case of changes)
Re: Compare the logotype(image) in PDF-files with an original image
Good to hear. If the file contents are identical, they should have the same MD5. Retrieving the MD5 would typically be much faster than comparing images pixel by pixel. Since you are just looking at images, I think that you first want to extract the images as you were doing, and then compare the extracted image file's MD5 against your library file image's MD5.
Re: Compare the logotype(image) in PDF-files with an original image
The structure of this instruction, became even more "stable" and works in more situations .:
( Wonder if @mikeyww would prefer Format() in this case )
Instead of using a temporary resultfile (out), how difficult it would be to use "Clipboard". (But that may be another question)
Code: Select all
RunWait % ComSpec " /c " A_WinDir "\System32\certutil.exe -hashfile """ file """ MD5 > """ Out """",, Hide
Instead of using a temporary resultfile (out), how difficult it would be to use "Clipboard". (But that may be another question)
Re: Compare the logotype(image) in PDF-files with an original image
Interesting!
Yeah, I love Format....
Yeah, I love Format....
-
- Posts: 4330
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Compare the logotype(image) in PDF-files with an original image
@mikeyww
You don't need using external apps to get MD5 hash, there is AHK code. Also you don't need to calculate hash to compare files, you can do it directly:
You don't need using external apps to get MD5 hash, there is AHK code. Also you don't need to calculate hash to compare files, you can do it directly:
Code: Select all
filePath1 := "D:\File1.pdf"
filePath2 := "D:\File2.pdf"
MsgBox, % CompareFiles(filePath1, filePath2)
CompareFiles(filePath1, filePath2) {
Loop 2
File%A_Index% := FileOpen(filePath%A_Index%, "r")
len := File1.Length
cmp := (len != "" && len = File2.Length)
Loop 2 {
if cmp {
File%A_Index%.Pos := 0
File%A_Index%.RawRead(buff%A_Index%, len)
}
File%A_Index%.Close()
}
Return cmp && DllCall("msvcrt\memcmp", "Ptr", &buff1, "Ptr", &buff2, "Ptr", len, "Cdecl") = 0
}
Last edited by teadrinker on 13 Nov 2020, 18:09, edited 1 time in total.
Re: Compare the logotype(image) in PDF-files with an original image
Looks awesome! Thank you.
Re: Compare the logotype(image) in PDF-files with an original image
I have not tested the idea from @teadrinker
I've been running MD5 for a while, and sometimes I get ErrorLevel 1 (do not know why.)
all MD5- values are stored in an array, together with other information about the images.
b) Then I get the MD5-value from the Image I want to compare.
c) Finally, the last MD5 value is compared with all other values, and when it is a hit,
a result is created from other information about the image.
But sometimes an error occurs. If I run again - no error is created.
Can teadrinkers' suggestions be more .:
I've been running MD5 for a while, and sometimes I get ErrorLevel 1 (do not know why.)
An excerpt from my code
a) First, all MD5 values are read from all images in a directory (now only 6 images, but will be about 50)all MD5- values are stored in an array, together with other information about the images.
b) Then I get the MD5-value from the Image I want to compare.
c) Finally, the last MD5 value is compared with all other values, and when it is a hit,
a result is created from other information about the image.
But sometimes an error occurs. If I run again - no error is created.
Can teadrinkers' suggestions be more .:
- Stable?
- Faster?
- Better?
-
- Posts: 4330
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Compare the logotype(image) in PDF-files with an original image
Who knows? You may test it and tell us.
Re: Compare the logotype(image) in PDF-files with an original image
Now I have tested the suggestion from @teadrinker.
It works if the two images have the same name, not with different names.
It doesn't work for me! (I don't know the name before the compare.)
My wish was to automatically find out who the invoice came from by analyzing the logo.
MD5 works, but I do not like runtime errors, especially when the process is to be automated.
I have searched MD5 for AHK on the Internet and the most complete is from @SKAN FileCRC32, FileSHA1, FileMD5() and MD5()
also @Laszlo has a similar solution md5.ahk
All suggestions work - on exe, ahk, txt, png files, but not on my jpg files. I have no idea why.
Searched further and found an intresting solution by @Rseding91 MD5 function for comparing images
When I started testing that function() I realized that AHK's FileRead can not handle jpg-files - Is that correct?
I'm unsure what to specify to the last function() Calc_MD5(_VarAddress, _VarSize)Does & mean something in the function call?
If a jpg-filename is specified in the call - will it be the hash number of the filename or file?
Is it possible to read a jpg file as binary data to a variable? (possibly this is one of my problems)
Share a test file containing all of the above functions()
It works if the two images have the same name, not with different names.
It doesn't work for me! (I don't know the name before the compare.)
My wish was to automatically find out who the invoice came from by analyzing the logo.
MD5 works, but I do not like runtime errors, especially when the process is to be automated.
I have searched MD5 for AHK on the Internet and the most complete is from @SKAN FileCRC32, FileSHA1, FileMD5() and MD5()
also @Laszlo has a similar solution md5.ahk
All suggestions work - on exe, ahk, txt, png files, but not on my jpg files. I have no idea why.
Searched further and found an intresting solution by @Rseding91 MD5 function for comparing images
When I started testing that function() I realized that AHK's FileRead can not handle jpg-files - Is that correct?
I'm unsure what to specify to the last function() Calc_MD5(_VarAddress, _VarSize)
Code: Select all
FileRead File,%A_ScriptFullPath%
FileGetSize FileSize,%A_ScriptFullPath%
Calc_MD5(&File, FileSize)
If a jpg-filename is specified in the call - will it be the hash number of the filename or file?
Is it possible to read a jpg file as binary data to a variable? (possibly this is one of my problems)
Share a test file containing all of the above functions()
AHK-testscript hash-number
-
- Posts: 4330
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Compare the logotype(image) in PDF-files with an original image
My code - in the directory LogoLib it is many jpg-files (one jpg-file is the same as filePath1teadrinker wrote: ↑14 Nov 2020, 15:56... Looks like you do something wrong. My code does not depend on file names.
xpdfLogo.jpg is a renamed copy of Ilab.jpg
Code: Select all
; Version 14 nov 2020
#NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn ; Enable warnings to assist with detecting common errors.
SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.
#Singleinstance force
; Test 1 (Works!)
; filePath1 := "c:\ProgExpo\temp\ilab.jpg"
; filePath1 := "..\LogoLib\ilab.jpg"
; Test 2 (same file new name - Doesn't work)
filePath1 := "c:\ProgExpo\temp\xpdfLogo.jpg"
filePath2 := "..\LogoLib\ilab.jpg"
MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % CompareFiles(filePath1, filePath2)
Loop Files, ..\LogoLib\*.*
{ ; MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % filePath1 "`n`n" A_LoopFileLongPath
If CompareFiles(filePath1, A_LoopFileLongPath)
{ MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % filePath1 "`n`nFound! `n`n" A_LoopFileLongPath
filePath2 := A_LoopFileLongPath
Break
}
else
MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % filePath1 "`n`nNOT found! `n`n" A_LoopFileLongPath
}
MsgBox ,, %A_ScriptName% - Rad %A_LineNumber%, % "The result .: " CompareFiles(filePath1, filePath2)
ExitApp
CompareFiles(filePath1, filePath2)
{ Loop 1
{ Loop 2
File%A_Index% := FileOpen(filePath%A_Index%, "r")
len := File1.Length
if (len = "" || len != File2.Length)
Break
Loop 2
{ File%A_Index%.Pos := 0
File%A_Index%.RawRead(buff%A_Index%, len)
}
cmp := true
}
Loop 2
File%A_Index%.Close()
Return cmp && DllCall("msvcrt\memcmp", "Ptr", &buff1, "Ptr", &buff2, "Ptr", len, "Cdecl") = 0
}
-
- Posts: 4330
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Compare the logotype(image) in PDF-files with an original image
It is not my code, should be:Albireo wrote: ↑Code: Select all
CompareFiles(filePath1, filePath2) { Loop 1 { Loop 2 File%A_Index% := FileOpen(filePath%A_Index%, "r") len := File1.Length if (len = "" || len != File2.Length) Break Loop 2 { File%A_Index%.Pos := 0 File%A_Index%.RawRead(buff%A_Index%, len) } cmp := true } Loop 2 File%A_Index%.Close() Return cmp && DllCall("msvcrt\memcmp", "Ptr", &buff1, "Ptr", &buff2, "Ptr", len, "Cdecl") = 0 }
Code: Select all
CompareFiles(filePath1, filePath2) {
Loop 2
File%A_Index% := FileOpen(filePath%A_Index%, "r")
len := File1.Length
cmp := (len != "" && len = File2.Length)
Loop 2 {
if cmp {
File%A_Index%.Pos := 0
File%A_Index%.RawRead(buff%A_Index%, len)
}
File%A_Index%.Close()
}
Return cmp && DllCall("msvcrt\memcmp", "Ptr", &buff1, "Ptr", &buff2, "Ptr", len, "Cdecl") = 0
}
Perhaps it's a wrong path, try specifying the full path.
Re: Compare the logotype(image) in PDF-files with an original image
@teadrinker its working fine and useful
Who is online
Users browsing this forum: roysubs and 273 guests