Page 1 of 2

FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 16:09
by JoeWinograd
I posted this issue at kon's FGP (FileGetProperties) topic:
https://autohotkey.com/boards/viewtopic ... 03#p198103

But in case kon is out-of-pocket, and also because this is likely not an FGP problem, I'm posting it here, too. If anyone can figure out why FGP is returning 0 when trying to get the "Pages" property (PropNum=148) in a PDF file, I'll be most appreciative. Thanks much, Joe

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 17:47
by jeeswg
- You mentioned something in the link, which sounds plausible, that there isn't any software installed that generates a Pages column. Did you try opening a folder with pdfs, and trying to add a 'Pages' column? In which case installing something might fix the problem.
- Otherwise you could try something like Xpdf (pdftotext.exe, pdfinfo.exe etc) to see if that works for you. They are command-line tools.
- Also, you could try to establish what OSes are affected, whether the AHK version / compilation status is a problem. Any mention of 'Windows 10' generally concerns me.

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 18:31
by A_AhkUser
Assuming Foxit Reader is installed on the target computer here's the hacky hacky way that occurs to me to retrieve the page count of pdf files - the more surprising is that it seems to work :o

Code: Select all

#NoEnv
#SingleInstance force
SetWorkingDir % A_ScriptDir
SendMode, Input
CoordMode, ToolTip, Screen
#Warn
; Windows 8.1 64 bit - Autohotkey v1.1.27.04 32-bit Unicode; Foxit Reader Version : 9.0.0.29935

myPdfFile := "C:\Users\Jérémy\Desktop\русский язык\Russie\textes\La Russie postsoviétique - Françoise Daucé\III.-Les-soubresauts-du-régime-eltsinien.pdf"

if (shellEmbeddedcanOpenPDFFiles()) {
Gui, Add, ActiveX, w800 h400 vWB, % myPdfFile
} ; else {
	; MsgBox, Could not retrieve the page count. Specifically: shellEmbeddedcanOpenPDFFiles returns false.
	; ExitApp
; }
return
!i::
DetectHiddenWindows, On
ControlGetText, pages, Edit1, % A_ScriptName
MsgBox % StrSplit(pages, " / ")[2]
DetectHiddenWindows, Off
Gui, Destroy
return

shellEmbeddedcanOpenPDFFiles() {
	
	Loop, Reg, HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall, R
	{
		if (A_LoopRegName == "DisplayName") {
		RegRead, __v
			if (__v == "Foxit Reader") {
				RegRead, __selectedTasks, HKLM\%A_LoopRegSubKey%, Inno Setup: Selected Tasks
			return (InStr(__selectedTasks, "displayinbrowser"))
			}
		}
	}
	return false
	
}

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 18:57
by JoeWinograd
Hi jeeswg,

> there isn't any software installed that generates a Pages column

Yep, I'm thinking it may be Adobe Acrobat (not Reader), specifically pdfshell.dll.

> Did you try opening a folder with pdfs, and trying to add a 'Pages' column?

Yes, and on the systems where FGP works, the Pages column shows the number of pages, while on the systems where it doesn't work, the Pages column is blank.

> In which case installing something might fix the problem.

Yes, but I need this to run on systems without any prerequisite requirements, especially no Adobe software.

> Otherwise you could try something like Xpdf (pdftotext.exe, pdfinfo.exe etc) to see if that works for you. They are command-line tools.

My first versions of the program (which, btw, I named CountPagesPDF) call pdfinfo.exe and works fine, but there are two problems — both because it has to open each PDF. First, speed — the FGP version is significantly faster than the pdfinfo version. When processing a large number of files, it makes an enormous difference. Second, password protected files — pdfinfo can't open them. I've also used PDFtk Server (pdftk.exe dump_data) to get the number of pages in PDFs, but it also has to open a file to get the metadata, so it will suffer from the same two issues as pdfinfo. Until I can figure out what's going on, I created a version of the program that first calls FGP_Value, but if that fails for PropNum=148 ("Pages"), it then calls pdfinfo.

> Also, you could try to establish what OSes are affected

I'm now confident that it has nothing to do with Windows version, AHK version, or compilation status. It definitely has nothing to do with W10, as the bulk of my testing has been on W7 (I ran just one test in a W10 sandbox).

Thanks for your thoughts — much appreciated! Regards, Joe

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 19:13
by JoeWinograd
Hi A_AhkUser,
That looks mighty clever, but as I mentioned to jeeswg, I need it to run on systems without any prerequisite requirements — no software from Adobe, Foxit, Nitro, etc. I'm compiling it with the standard AHK compiler and creating a standard Windows installer (a Setup.exe file) with NSIS. Everything that the user needs to run CountPagesPDF successfully must be installed with the installer, which is why getting FGP to work is so important to me. But thanks for your interesting idea — I appreciate it. Regards, Joe

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 19:52
by FanaticGuru
JoeWinograd wrote:Hi jeeswg,

> there isn't any software installed that generates a Pages column

Yep, I'm thinking it may be Adobe Acrobat (not Reader), specifically pdfshell.dll.

> Did you try opening a folder with pdfs, and trying to add a 'Pages' column?

Yes, and on the systems where FGP works, the Pages column shows the number of pages, while on the systems where it doesn't work, the Pages column is blank.

> In which case installing something might fix the problem.

Yes, but I need this to run on systems without any prerequisite requirements, especially no Adobe software.

> Otherwise you could try something like Xpdf (pdftotext.exe, pdfinfo.exe etc) to see if that works for you. They are command-line tools.

My first versions of the program (which, btw, I named CountPagesPDF) call pdfinfo.exe and works fine, but there are two problems — both because it has to open each PDF. First, speed — the FGP version is significantly faster than the pdfinfo version. When processing a large number of files, it makes an enormous difference. Second, password protected files — pdfinfo can't open them. I've also used PDFtk Server (pdftk.exe dump_data) to get the number of pages in PDFs, but it also has to open a file to get the metadata, so it will suffer from the same two issues as pdfinfo. Until I can figure out what's going on, I created a version of the program that first calls FGP_Value, but if that fails for PropNum=148 ("Pages"), it then calls pdfinfo.

> Also, you could try to establish what OSes are affected

I'm now confident that it has nothing to do with Windows version, AHK version, or compilation status. It definitely has nothing to do with W10, as the bulk of my testing has been on W7 (I ran just one test in a W10 sandbox).

Thanks for your thoughts — much appreciated! Regards, Joe
Your options are not good.

The reason FGP cannot get the page count from some PDF files is because the program that generated the files did not put that information in the files extended details. A program can put pretty much anything it wants in the extended details when it creates a file. It does not even have to be correct or make sense. It is extended details that is not mandatory by the file system. So if the detail is not in the files details no solution getting it from the file system is going to work.

So if no program has calculated a PDFs number of pages and stored it in the extended details, that only leaves the option of having some program calculate the number of pages. This is going to be much slower and it is going to require some program that understands PDFs. That is going to be difficult with this requirement: "run on systems without any prerequisite requirements, especially no Adobe software". I am don't believe a PDF file even has that information neatly stored in it.

A PDF is a stream of data kind of like a webpage that does not know exactly how it is going to get rendered until some program actually renders it. And different programs might render it a little different depending on adherence to protocols.

You can open a PDF in a plain text editor and see this stream. It might be possible to parser this stream and figure out the page count. This would basically be making AHK understand PDFs structure enough to figure page counts.

The computers with the PDFs have to be using some program to open them. That program might be able to get the page count info in some API way.

FG

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 20:09
by jeeswg
- @FanaticGuru: Thanks very much for this information. So are extended details part of the file data, or something like the modified date (separate from the file data). Can I give a txt file a page count? (I'm just googling it now.)
- @JoeWinograd: I suppose you could check whether certain files always return zeros/blanks irrespective of the PC. And maybe those files are older files.

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 20:34
by JoeWinograd
The reason FGP cannot get the page count from some PDF files is because the program that generated the files did not put that information in the files extended details.
I don't think so. For example, here's the right-click>Properties dialog of a PDF file created on a W7 system without Acrobat:
properties pdf created on W7 without acrobat.gif
properties pdf created on W7 without acrobat.gif (15.9 KiB) Viewed 5664 times
Notice that there is no PDF Information tab (and, of course, no page count). Hovering on that file in a file manager gives this:
hover no acrobat.gif
hover no acrobat.gif (2.1 KiB) Viewed 5664 times
I transferred that file to a W7 system with Acrobat installed — didn't touch the file in any way. Here's the right-click/Properties and the hovering output on that system (combined into a single image, since this forum allows only three images in a post):
properties and hover same pdf on W7 with acrobat.png
properties and hover same pdf on W7 with acrobat.png (20.37 KiB) Viewed 5664 times
There's a PDF Information tab and it has the page count in it. My guess — and that's all it is at this time, a guess — is that Acrobat's pdfshell.dll is providing the PDF Information tab with the metadata.
That is going to be difficult with this requirement: "run on systems without any prerequisite requirements, especially no Adobe software".
I'm fine with including pdfinfo.exe in my installer (the Xpdf toolkit is open source and may be distributed under numerous GPL licenses). That way, the user needs no prerequisites — simply run my CountPagesPDF installer.

Thanks for your comments! Regards, Joe

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 21:02
by FanaticGuru
Yea, pdfshell.dll is an Adobe shell extension. You normally only have that if some Adobe PDF software is installed. It basically adds that tab to the shell file explorer. The same way that Adobe adds lots of stuff to context menus and programs. Adobe weaves its way into lots of programs including the shell explorer.

When you click on the "Details" tab on the file, do you see a "Pages" detail? It so then Adobe is putting that information in the Details, if not then Adobe through an extension is calculating that information on the fly. If in the Details, the file system can access it. If not then the file system will probably not be able to access it.

Also I have only seen pdfshell.dll in 32-bit, so could be inconsistent sometimes on 64-bit machines.

FG

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 21:13
by JoeWinograd
When you click on the "Details" tab on the file, do you see a "Pages" detail?
No. Here's the dialog:
details.gif
details.gif (20.31 KiB) Viewed 5651 times
There are only two fields under the last one shown above — Owner and Computer.
If not then the file system will probably not be able to access it.
FGP_Value with 148 or "Pages" retrieves it fine on systems with Acrobat.

Thanks for the heads-up on the 32-bit/64-bit issue.

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 21:26
by FanaticGuru
jeeswg wrote:So are extended details part of the file data, or something like the modified date (separate from the file data). Can I give a txt file a page count? (I'm just googling it now.)
It is like the modified date. Except those are "Details", pages is "Extended Details". Extended Details are like optional details. A file can have them or not, the file system does not really care.

But every file type has a set list of Extended Details, that is defined by the app that is associated with that file type. The app can also define if they are read-only or can only be set by the app. It also cannot be just any Extended Detail. There is a list defined by the operating system, but it is a long list. Later with Alternate Data Streams it might have opened up to user defined but I don't know.

Now that is for the normal user. If the app can define and change the Extended Details, I assume it could be done by another app too. I don't know how to do it though.

If you come up with a way to add Details to files, that would be something I would be interested in.

FG

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 21:36
by jeeswg
- @FanaticGuru: Thanks a lot. I looked it up and it appeared to be horrendous, with hardly anyone knowing what was possible, and conflicting/contradictory information, and very little to go on. I haven't been down the rabbit hole like that in a while. These random awkward IT problems. It's best to assume that I won't make any progress with it, but I will at least do some more research. I've added it to my to-do list.
- You could start a thread with what you know, or, post some links here, and I'll start a thread, re. anything anyone knows about extended file details. Cheers.

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 21:45
by JoeWinograd
anything anyone knows about extended file details
Not a clue here, but would love to learn about it. Will keep an eye out for a new thread from you on it.

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 01 Feb 2018, 21:57
by FanaticGuru
JoeWinograd wrote:
When you click on the "Details" tab on the file, do you see a "Pages" detail?
No. Here's the dialog:

details.gif

There are only two fields under the last one shown above — Owner and Computer.
If not then the file system will probably not be able to access it.
FGP_Value with 148 or "Pages" retrieves it fine on systems with Acrobat.

Thanks for the heads-up on the 32-bit/64-bit issue.
That is somewhat surprising.

FGP is basically using some thing like this: oShell := ComObjCreate("shell.application")
To basically automate the Explorer Shell.

pdfshell.dll is extending Shell. It is adding a tab to the dialog and it is causing the Shell to report page numbers for PDFs but it is not willing to expose that information in the Extended Details Dialog. A mystery but probably just a design decision to take all that Extended Details data and put it in its own PDF tab and not show any of it in the Details tab. Still don't know at what point pdfshell.dll is calculating that information and adding it to the file.

I have the full Adobe Acrobat installed but I don't have that PDF tab in my explorer or I would not be playing such 20 questions with you.

This mystery does not solve your problem though. It appears you have to have pdfshell.dll on the computer or some 3rd party software. The 3rd party software is going to be really slow to open up each PDF and get its page number. pdfshell.dll is the best because it allows the Shell to get the page numbers in a very quick way. I don't know if putting pdfshell.dll on the computers is an option. You could probably have AHK install and register the dll automatically in some type of installation package. Maybe check if it is already on the system and if not install it. Windows will not like this though and will ask for all kinds of permissions.

FG

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 02 Feb 2018, 01:41
by Xeo786
You can use Acrobat COM to get actual page count... ;)

Code: Select all

myPdfFile := "C:\Users\Jérémy\Desktop\русский язык\Russie\textes\La Russie postsoviétique - Françoise Daucé\III.-Les-soubresauts-du-régime-eltsinien.pdf"
AcroApp := ComObjCreate("AcroExch.App")
;~ AcroApp.show()
Document := ComObjCreate("AcroExch.PDDoc")
Document.Open(myPdfFile)
MsgBox, % Document.GetNumPages()

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 02 Feb 2018, 15:01
by JoeWinograd
FGP is basically using some thing like this: oShell := ComObjCreate("shell.application")
Yes, that's exactly what it does, followed by oFolder:=oShell.NameSpace(0), and then populates an array with property names and property numbers. All I care about is "Pages"/148, but here's everything that FGP produces for a PDF:
FGP properties PDF with Acrobat installed.gif
FGP properties PDF with Acrobat installed.gif (16.36 KiB) Viewed 5604 times
pdfshell.dll is extending Shell. It is adding a tab to the dialog and it is causing the Shell to report page numbers for PDFs
That's my guess — pdfshell.dll is responsible for it.
Still don't know at what point pdfshell.dll is calculating that information and adding it to the file.
Good question!
I have the full Adobe Acrobat installed but I don't have that PDF tab in my explorer or I would not be playing such 20 questions with you.
That's strange! On two systems where I have full Acrobat (one is XI Pro, the other is X Std), the tab is there; on four systems without full Acrobat, the tab is not there.
It appears you have to have pdfshell.dll on the computer or some 3rd party software.
Yes, that seems to be the case.
The 3rd party software is going to be really slow to open up each PDF and get its page number.
It's somewhat slow, but pdfinfo.exe is not too bad (haven't tested PDFtk.exe dump_data yet). An even bigger problem is not being able to get the page count of a password-protected PDF.
pdfshell.dll is the best because it allows the Shell to get the page numbers in a very quick way.
Exactly! And, it exposes the page count of password-protected PDFs.
I don't know if putting pdfshell.dll on the computers is an option.
I doubt it. That would almost surely violate Adobe's licensing.
You could probably have AHK install and register the dll automatically in some type of installation package.
Technically, yes; but legally, probably not.

Thanks for all the comments, FG — much appreciated! Regards, Joe

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 02 Feb 2018, 15:04
by JoeWinograd
Thank you for that, Xeo786, but that requires Acrobat to be installed, which must not be a requirement of my program. Regards, Joe

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 02 Feb 2018, 17:55
by FanaticGuru
@JoeWinograd
At one time I believe pdfshell.dll was installed by Adobe Reader. But I don't believe the new Adobe Reader DC uses it. Might look if an older version of Adobe Reader installs pdfshell.dll and install that on all the computers. Free and not really a problem most companies would mind installing on computers.

pdfshell.dll is also effect by 32 bit and 64 bit so you might observe if that correlates with which computers show pages and which don't

After further reading it appears pdfshell.dll is adding page count to files' details in the background at lot of times. Basically any time Shell with the pdfshell.dll extension interacts with a file, it calculates the page count and adds it to the file. If you have a million PDFs on a computer and installed some Adobe product that installed pdfshell.dll, all the files would not suddenly get the page count added to the file details. But every time you open a folder through Windows Explorer Shell, pdfshell.dll goes to work on all the files in that folder and starts adding page count to the files details. It is a handler that monitors Shell and jumps in any time a PDF is encountered that does not have extended details and adds them. It calculates the extended details pretty quick but after it does it once for a file it saves it in the files details and does not have to calculate them again in the future. Another wrinkle is that Adobe has helper processes that will also scan your hard drive in the background looking for PDFs to process, making thumbnails, extended details, tooltips, etc

Password protected files are tough. But even them if you look at them in Notepad still shows formatting. You might consider looking at the PDFs in plain text and you might be able to just RegEx some formatting. Just at a glance it looks like each page ends with Type/Page>>. So you might be able to just count those to get a PDFs number of pages. Maybe test it on a bunch of PDFs on your computer that shows page count through FGP and see if RegEx on the plain text is reliable.

FG

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 02 Feb 2018, 21:31
by JoeWinograd
Hi FG,
I don't believe the new Adobe Reader DC uses it.
That seems to be the case. I searched for it on a system with the latest Adobe Acrobat Reader DC (18.009.20050) and pdfshell.dll is nowhere to be found.
Might look if an older version of Adobe Reader installs pdfshell.dll and install that on all the computers.
Interesting idea.
After further reading
You've certainly uncovered some fascinating info on pdfshell.dll. Thanks for that research.
It is a handler that monitors Shell and jumps in any time a PDF is encountered that does not have extended details and adds them.
Wow!
You might consider looking at the PDFs in plain text
Brilliant idea! I just pulled up several PDFs in my fav text editor, including some password-protected ones, and they all have a string of "/Count" followed by the page count — very nice!

I then wrote some test code to read each PDF into a variable with FileRead, which worked on some of the PDFs, but the text was truncated on others. I then tried FileOpen, but that also resulted in truncated text. I'm guessing that a non-text character in the PDF is being interpreted as an end-of-stream character by FileRead and FileOpen? Any idea how to fix that and be able to read an entire PDF into a variable without any truncation? Thanks, Joe

Update: I determined that FileRead and FileOpen are both truncating the text when encountering a null character (hex 00). Any way to prevent that from happening?

Re: FGP (FileGetProperties) returning 0 when trying to get page count for PDF files

Posted: 03 Feb 2018, 01:10
by FanaticGuru
JoeWinograd wrote:I then wrote some test code to read each PDF into a variable with FileRead, which worked on some of the PDFs, but the text was truncated on others. I then tried FileOpen, but that also resulted in truncated text. I'm guessing that a non-text character in the PDF is being interpreted as an end-of-stream character by FileRead and FileOpen? Any idea how to fix that and be able to read an entire PDF into a variable without any truncation? Thanks, Joe

Update: I determined that FileRead and FileOpen are both truncating the text when encountering a null character (hex 00). Any way to prevent that from happening?
I don't know about the truncating. I have seen this when I attempt to display the information but the information seems to be in the string if I do StrLen().

Anyways, Count does not seem to be reliable. There seems to be lots of Count in the PDFs I looked at. Something about parents and trees with some adding up to others. I thought maybe just get the highest count as the smaller counts appears to be like subsections. That worked for some files but not others.

I then tried totalling the end of page marker: Type/Page>>. This seemed to work better.

Below is some test code:

Code: Select all

FileSelectFile, File_FullPath, 3,,,PDF (*.pdf)
oFile := FileOpen(File_FullPath, "r")
sFile := oFile.Read()

Count_Hi := 1, X := 1
while (X := RegExMatch(sFile, "/Count (\d+)\s*/", Count, X+StrLen(Count)))
    if (Count1 > Count_Hi)
		Count_Hi := Count1

Page_Hi := 0, X := 1
while (X := RegExMatch(sFile, "Type/Page>>", Count, X+StrLen(Count)))
    Page_Hi++

MsgBox % File_FullPath "`nCount`t" Count_Hi "`nPage>>`t" Page_Hi
It shows both methods. Maybe a start for you to work from.

FG