Removing watermarks from pdf. Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
colt
Posts: 291
Joined: 04 Aug 2014, 23:12
Location: Portland Oregon

Removing watermarks from pdf.

Post by colt » 31 May 2023, 13:29

I am trying to remove the watermarks I placed on several pdfs.
I used pdftk to get the uncompressed version of of the pdf.

Code: Select all

cmd := """C:\Program Files (x86)\PDFtk Server\bin\pdftk.exe"" """ . src . """ output """ . compressOut . """ uncompress"
Next i use this regexReplace to remove the watermarks in the pdf text

Code: Select all

fileRead pdfRaw ,% compressOut ,cp866 ;<-same encoding as notepad++ autodetects
fixedText := regexReplace(pdfRaw ,"BT(?:(?!ET).)*?DRAWING IS NOT(?:(?!ET).)*?ET","")
fileAppend,% fixedText , % fixedRawOut 
i then save the pdf raw text and recompress the pdf using pdftk

Code: Select all

cmd := """C:\Program Files (x86)\PDFtk Server\bin\pdftk.exe"" """ . fixedRawOut . """ output """ . fixedCompressOut . """ compress"
The problem is the replacement technique only works in notepad++. It fails in autohotkey because the contents of pdfRaw stop at the first null character of the uncompressed pdf text.
If i do it with fileOpen i get the same result

Code: Select all

file := fileOpen(compressOut ,"r","cp866")
pdfRaw := file.read(file.length)
so how to do regexReplace on the full contents if the file contains null characters?

colt
Posts: 291
Joined: 04 Aug 2014, 23:12
Location: Portland Oregon

Re: Removing watermarks from pdf.  Topic is solved

Post by colt » 31 May 2023, 16:31

Needed to do it through powershell

Code: Select all

;needed to modify regex to have (?s) at beginning to signify matching over linefeed
cmd := "$data = get-content -raw '" . compressOut . "'`n$replace = $data -replace '(?s)BT(?:(?!ET).)*?DRAWING IS NOT(?:(?!ET).)*?ET',''`n$replace | add-content -path '" . fixedRawOut . "'"
run powershell.exe -command &{%cmd%},,hide

Post Reply

Return to “Ask for Help (v1)”