AutoHotkey Community

It is currently May 26th, 2012, 4:09 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 33 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject:
PostPosted: August 13th, 2007, 10:07 pm 
Offline
User avatar

Joined: December 29th, 2004, 1:28 pm
Posts: 2542
Hi new,
Have a look at Loop, FilePattern in the AutoHotkey Help Documentation. You could use the Loop command to find all *.pdf files and use the Run or RunWait command to automate processing the files one by one.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 20th, 2008, 2:32 pm 
Offline

Joined: May 20th, 2008, 2:28 pm
Posts: 1
Hi there,

Further to new's problem and corrupts answer i was just wondering whether the answer had worked for new?

As i have the same problem and was just wondering whether it wos worth trying it out or not.

_________________
| - PinkBears


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 13th, 2008, 3:28 pm 
why don't you try it and see


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: July 9th, 2008, 1:11 pm 
Offline

Joined: October 27th, 2006, 10:12 am
Posts: 649
I was looking for a tool to split bigger pdf documents into smaller chunks like e.g. 50 pages each and this might help. Great, thanks for the link.

(For the interested: Some applications don't accept bigger .pdf documents as input, so the solution: make smaller chunks and convert each separately overcoming the tools size limitations.)


Report this post
Top
 Profile  
Reply with quote  
 Post subject: pdf to html
PostPosted: April 13th, 2009, 9:11 am 
Offline

Joined: March 18th, 2008, 4:04 am
Posts: 193
pdf to html

in pdftk,
is there an option to convert pdf to html ?

the pdf to text I found in xpdf lib, and it work ok.

if not ,is there other command line tool to do that ?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 13th, 2009, 10:00 am 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
Based on XPDF,
http://pdftohtml.sourceforge.net/

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 13th, 2009, 5:04 pm 
Offline

Joined: March 18th, 2008, 4:04 am
Posts: 193
Hi HugoV

I tried to run the pdftohtml but got an error:
Page-1
'gswin32c' is not recognized as an internal or external command,
operable program or batch file.
Error: Failed to launch Ghostscript!

seems, the pdftohtml.exe is not enough, or some other missing software.

do you know on some other pdftohtml command line ?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 13th, 2009, 8:48 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
Ghostscript
http://pages.cs.wisc.edu/~ghost/
(I've used it and it works, but you may have to work at it)

If I recall correctly a "pdftohtml" is also included in the google desktop search application (at least it was at some point, don't know if this is still the case as I don't use it) if you have it look for pdf*.exe in the google desktop dirs, it should be there somewhere.

Note: if you want to work with pdfs:
- get pdtfk
- get xpdf
- get pdttohtml
- get ghostscript
- get PDFCreator

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 17th, 2009, 5:04 am 
Offline

Joined: March 18th, 2008, 4:04 am
Posts: 193
PDF creator

the pdf creator convert html to PDF , as tagged or untagged PDF ?
or it converts to PDF as image ?

meas:
I can extract text from the created PDF ?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 17th, 2009, 7:31 am 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
PDFCreator converts anything you print to PDF, yes you can extract text later IF the source wasn't an image to begin with. Not sure what you mean
by tagged but it won't make URLs in Word documents clickable in the PDF
nor does it create PDF bookmarks or anything like that.

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 17th, 2009, 1:46 pm 
Offline

Joined: March 18th, 2008, 4:04 am
Posts: 193
sorry for asking again,

I still didn't found the pdf2html converter,

from google, is there some google doc's api
so I can download it and make own pdf's to htmls ?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 17th, 2009, 2:02 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
If you have Google desktop installed:

c:\Program Files\Google\Google Desktop Search\pdftotext.exe
(or where ever you have installed GDS)

usage:

pdftotext -htmlmeta sample.pdf
--> will generate sample.html

follow the link on the URL I gave you before, leads to:
http://sourceforge.net/projects/pdftohtml/
download the windows binary, unpack the tar.gz file

usage:

pdftohtml.exe sample.pdf
--> will generate 3 html files (frameset, TOC and content)

read the doc for more options

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 17th, 2009, 2:45 pm 
Offline

Joined: March 18th, 2008, 4:04 am
Posts: 193
I tried the (in xpdf lib):
pdftotext -htmlmeta sample.pdf -> sample.html

and got same result as :
pdftotext -layout sample.pdf -> sample.txt

means :
the html have no the same 'look' (more or less) as the original sample.pdf
no frames or some colors


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 17th, 2009, 2:53 pm 
Offline

Joined: March 18th, 2008, 4:04 am
Posts: 193
I tried also the pdftohtml.exe

and got worser reuslts,
the text extracted is shown one under another, without keeping any original layout , of .pdf


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: April 17th, 2009, 4:03 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
What do you want, the HTML to look the same as the PDF?

Again read the documentation, see the options, try them and
SEE that you can have the HTML look like the PDF, or not
if you wish. Use the -c option

pdftohtml -c sample.pdf
-> sample.html will look like sample.pdf (not 100% but pretty close)
unless you have a very complicated PDF. Again READ the documentation
As you can see even google uses it so why isn't it good enough for you :wink:

IF you need even better or more options you will have to buy something

Sourceforge version:

pdftohtml version 0.39 http://pdftohtml.sourceforge.net/,
based on Xpdf version 3.00
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC

Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
-f <int> : first page to convert
-l <int> : last page to convert
-q : don't print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom <fp> : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc <string> : output text encoding name
-dev <string> : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)


GOOGLE version:

pdftohtml version 0.39 http://pdftohtml.sourceforge.net/,
based on Xpdf version 3.00
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC

Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
-f <int> : first page to convert
-l <int> : last page to convert
-q : don't print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom <fp> : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc <string> : output text encoding name
-dev <string> : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 33 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC [ DST ]


Who is online

Users browsing this forum: MSN [Bot] and 3 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group