Using COM with PDF files

Get help with using AutoHotkey and its commands and hotkeys
Heezea
Posts: 53
Joined: 30 Sep 2013, 21:33

Using COM with PDF files

15 Jun 2015, 14:16

Is there any way to use COM on PDF files? I'd like to be able to extract pages from the PDF based on a search for specific text. I've used various programs to see PDFs, including Adobe Acrobat Pro, Adobe Reader, and Bluebeam Revu. Right now I'm able to mostly automate the process to extract the pages using the program's user interface, but this is somewhat error prone and also fairly dependent on how each person has the specific program configured.
User avatar
Blackholyman
Posts: 1292
Joined: 29 Sep 2013, 22:57
Facebook: socialjsz
Google: +Jszapp
Location: Denmark
Contact:

Re: Using COM with PDF files

15 Jun 2015, 16:59

The code below works ONLY with Adobe Professional

Code: Select all

#Persistent

AVDoc := ComObjCreate("AcroExch.AVDoc")

FileSelectFile, path,,, Select a pdf, pdf (*.pdf)

AVDoc.Open(path, "")

AVDoc.BringToFront()

if AVDoc.FindText( "Back", true, true,  bReset := true )
	msgbox, the word was found once no more info to be hade.

AVDoc.close(1)

AVDoc=

return
More on finding text, vba examples http://www.myengineeringworld.net/2014/ ... h-vba.html
Also check out:
Courses on AutoHotkey

My Autohotkey Blog
:dance:
User avatar
Chunjee
Posts: 688
Joined: 18 Apr 2014, 19:05
GitHub: Chunjee

Re: Using COM with PDF files

15 Jun 2015, 17:15

Alternatively the method have used is to convert the entire pdf thing to txt with Xpdf (http://www.foolabs.com/xpdf/download.html). Then you can read the file as you normally would. The drawback is you need to include that exe.

Might look like:

Code: Select all

RunWait, %comspec% /c %A_ScriptDir%\PDFtoTEXT.exe %TheSelectedPDF% %A_ScriptDir%\%TheTXT%.txt,,Hide
I will try the way Blackholyman has posted sometime though.
User avatar
Jovannb
Posts: 244
Joined: 17 Jun 2014, 02:44
Location: Austria

Re: Using COM with PDF files

20 Jun 2015, 10:59

Hi,

as I learned to know, everything you would like to do is possibel with Ghostscript (GS) and its available utility programs.
Once you got it, how to deal with GS you will see almost everything in manipulating Postscript and PDF-files is possible, you should give it a try.

regards

J.B.
AHK-Release 1.1.30.03 Ansi 32-bit, Win10 (1903 18362.295) 64-bit, german
User avatar
Chunjee
Posts: 688
Joined: 18 Apr 2014, 19:05
GitHub: Chunjee

Re: Using COM with PDF files

20 Jun 2015, 16:07

Seems like overkill. Whats the benefit?

regards

J.B.
User avatar
Jovannb
Posts: 244
Joined: 17 Jun 2014, 02:44
Location: Austria

Re: Using COM with PDF files

21 Jun 2015, 02:27

the benefit is, as soon as you know how to call GS you can e.g.
# use tesseract (OCR) to extract text
# join files (PS oder PDF) to one single PDF
# extract pages
# rotate pages
# modifiy PDF-doc properties (including initial/opening view, bookmarks...)
and so on, and it is free

With the following code snippet you join 2 Postscript-Files (Ps1 and Ps2) to a single PDF

Code: Select all

formattime, timestamp,, yyyyMMdd_HHmmss	
timestamp_output_file:="c:\temp\" timestamp ".pdf"
ps1:="c:\temp\postscriptfile1.ps"
ps2:="c:\temp\postscriptfile2.ps"
ghostscriptcall:="c:\Program Files\gs\gs9.07\bin\gswin64c.exe -q -dNOSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dPDFSETTINGS=/prepress -dLockDistillerParams=false -dAutoRotatePages=/PageByPage -dEmbedAllFonts=true -dSubsetFonts=true -r600 -dDownsampleMonoImages=true -dMonoImageDownsampleThreshold=1.5 -dMonoImageDownsampleType=/Bicubic -dMonoImageResolution=600 -dDownsampleGrayImages=true -dGrayImageDownsampleThreshold=1.5 -dGrayImageDownsampleType=/Bicubic -dGrayImageResolution=300 -dDownsampleColorImages=true -dColorImageDownsampleThreshold=1.5 -dColorImageDownsampleType=/Bicubic -dColorImageResolution=150 -dConvertCMYKImagesToRGB=false -sOutputFile=" timestamp_output_file " -c .setpdfwrite -f " ps1 " "  ps2 

run, %ghostscriptcall%,,Hide
For creating that Postscriptfiles you can use Ghostscript printer driver (..\lib\.. in GS installation directory), then connect this printer to a local port which is called e.g. "c:\temp\postcriptfile.ps", thats all. When you now print to that driver and you use GS to create Pdf's as mentioned above, you've a 100% free pdf-creation-setup with one single program with an amazing amount of possibilities to manipulate pdfs..that works for existing pdf's as well.

J.B.
AHK-Release 1.1.30.03 Ansi 32-bit, Win10 (1903 18362.295) 64-bit, german
becofuan
Posts: 1
Joined: 25 Aug 2015, 03:04

Re: Using COM with PDF files

26 Aug 2015, 22:05

Please refer to this: http://www.utteraccess.com/forum/Extrac ... 07085.html
Here's a similar topic about extracting PDF page by searching specific text.
Hope it can help you.

Return to “Ask For Help”

Who is online

Users browsing this forum: BarryGil, boiler and 38 guests