Using COM with PDF files
Using COM with PDF files
Is there any way to use COM on PDF files? I'd like to be able to extract pages from the PDF based on a search for specific text. I've used various programs to see PDFs, including Adobe Acrobat Pro, Adobe Reader, and Bluebeam Revu. Right now I'm able to mostly automate the process to extract the pages using the program's user interface, but this is somewhat error prone and also fairly dependent on how each person has the specific program configured.
- Blackholyman
- Posts: 1293
- Joined: 29 Sep 2013, 22:57
- Location: Denmark
- Contact:
Re: Using COM with PDF files
The code below works ONLY with Adobe Professional
More on finding text, vba examples http://www.myengineeringworld.net/2014/ ... h-vba.html
Code: Select all
#Persistent
AVDoc := ComObjCreate("AcroExch.AVDoc")
FileSelectFile, path,,, Select a pdf, pdf (*.pdf)
AVDoc.Open(path, "")
AVDoc.BringToFront()
if AVDoc.FindText( "Back", true, true, bReset := true )
msgbox, the word was found once no more info to be hade.
AVDoc.close(1)
AVDoc=
return
Courses on AutoHotkey
My Autohotkey Blog
![Yay! Dance! Yay! :dance:](./images/smilies/dance.gif)
Re: Using COM with PDF files
Alternatively the method have used is to convert the entire pdf thing to txt with Xpdf (http://www.foolabs.com/xpdf/download.html). Then you can read the file as you normally would. The drawback is you need to include that exe.
Might look like:
I will try the way Blackholyman has posted sometime though.
Might look like:
Code: Select all
RunWait, %comspec% /c %A_ScriptDir%\PDFtoTEXT.exe %TheSelectedPDF% %A_ScriptDir%\%TheTXT%.txt,,Hide
Re: Using COM with PDF files
Hi,
as I learned to know, everything you would like to do is possibel with Ghostscript (GS) and its available utility programs.
Once you got it, how to deal with GS you will see almost everything in manipulating Postscript and PDF-files is possible, you should give it a try.
regards
J.B.
as I learned to know, everything you would like to do is possibel with Ghostscript (GS) and its available utility programs.
Once you got it, how to deal with GS you will see almost everything in manipulating Postscript and PDF-files is possible, you should give it a try.
regards
J.B.
AHK: 1.1.37.01 Ansi, 32-Bit; Win10 22H2 64 bit, german
Re: Using COM with PDF files
Seems like overkill. Whats the benefit?
regards
J.B.
regards
J.B.
Re: Using COM with PDF files
the benefit is, as soon as you know how to call GS you can e.g.
# use tesseract (OCR) to extract text
# join files (PS oder PDF) to one single PDF
# extract pages
# rotate pages
# modifiy PDF-doc properties (including initial/opening view, bookmarks...)
and so on, and it is free
With the following code snippet you join 2 Postscript-Files (Ps1 and Ps2) to a single PDF
For creating that Postscriptfiles you can use Ghostscript printer driver (..\lib\.. in GS installation directory), then connect this printer to a local port which is called e.g. "c:\temp\postcriptfile.ps", thats all. When you now print to that driver and you use GS to create Pdf's as mentioned above, you've a 100% free pdf-creation-setup with one single program with an amazing amount of possibilities to manipulate pdfs..that works for existing pdf's as well.
J.B.
# use tesseract (OCR) to extract text
# join files (PS oder PDF) to one single PDF
# extract pages
# rotate pages
# modifiy PDF-doc properties (including initial/opening view, bookmarks...)
and so on, and it is free
With the following code snippet you join 2 Postscript-Files (Ps1 and Ps2) to a single PDF
Code: Select all
formattime, timestamp,, yyyyMMdd_HHmmss
timestamp_output_file:="c:\temp\" timestamp ".pdf"
ps1:="c:\temp\postscriptfile1.ps"
ps2:="c:\temp\postscriptfile2.ps"
ghostscriptcall:="c:\Program Files\gs\gs9.07\bin\gswin64c.exe -q -dNOSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dPDFSETTINGS=/prepress -dLockDistillerParams=false -dAutoRotatePages=/PageByPage -dEmbedAllFonts=true -dSubsetFonts=true -r600 -dDownsampleMonoImages=true -dMonoImageDownsampleThreshold=1.5 -dMonoImageDownsampleType=/Bicubic -dMonoImageResolution=600 -dDownsampleGrayImages=true -dGrayImageDownsampleThreshold=1.5 -dGrayImageDownsampleType=/Bicubic -dGrayImageResolution=300 -dDownsampleColorImages=true -dColorImageDownsampleThreshold=1.5 -dColorImageDownsampleType=/Bicubic -dColorImageResolution=150 -dConvertCMYKImagesToRGB=false -sOutputFile=" timestamp_output_file " -c .setpdfwrite -f " ps1 " " ps2
run, %ghostscriptcall%,,Hide
J.B.
AHK: 1.1.37.01 Ansi, 32-Bit; Win10 22H2 64 bit, german
Re: Using COM with PDF files
Please refer to this: http://www.utteraccess.com/forum/Extrac ... 07085.html
Here's a similar topic about extracting PDF page by searching specific text.
Hope it can help you.
Here's a similar topic about extracting PDF page by searching specific text.
Hope it can help you.