searching for a string in pdf file

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Ernestas
Posts: 33
Joined: 16 Apr 2018, 02:07

searching for a string in pdf file

03 Sep 2018, 00:07

hello guys!

I have faced a problem with searching for a string in a pdf file, does anyone know how to do that? I would really appriaciate it, thank you guys! :morebeard:
garry
Posts: 3795
Joined: 22 Dec 2013, 12:50

Re: searching for a string in pdf file

03 Sep 2018, 01:13

usually it's possible in pdf editor to find text with ctrl+f , maybe a freeware pdf2txt
Ernestas
Posts: 33
Joined: 16 Apr 2018, 02:07

Re: searching for a string in pdf file

03 Sep 2018, 01:49

I could find the string with ctrl+f, but I need diggits near the string. For example I want to find birth of date and the date is next to the string "birth of date" so I won't be able to grable the actual date if you know what I mean. With pdf2txt, I am not able to download other software to my work computer, so I am kinda stuck here. Any other suggestions would really help, thank you guys!
garry
Posts: 3795
Joined: 22 Dec 2013, 12:50

Re: searching for a string in pdf file

03 Sep 2018, 02:08

a chinese script pdf2txt.ahk
.. but realized needs also conversion program , exe not exists :
" 转换程序不存在 "

Code: Select all

g_TxtFileList =
g_pdfFileList =
g_exe = 

Gui, Add, StatusBar,, 状态栏
Gui, Add, Button, x16 y20 w100 h30 g【指定工作目录】, 指定工作目录
Gui, Add, Edit, x126 y20 w700 h30 v_edtDir, %a_workingdir%

gui, Add, Button, x836 y20 w100 h30 g【退出】, 退出

Gui, Add, Button, x16 y60 w100 h30 g【选择转换程序】, 选择转换程序
Gui, Add, Edit, x126 y60 w700 h30 v_edtExe, 转换程序

gui, Add, Button, x836 y60 w100 h30 g【开始转换】, 开始转换 
gui, Add, Checkbox, x836 y110 w100 h30 v_Checked g【暂停复选框】, 暂停 

Gui, Add, Text, x526 y110 w220 h20 , 已经转换好的TXT文件
Gui, Add, ListBox, x526 y140 w420 h484 v_TxtListBox, TxtListBox

Gui, Add, Text, x16 y110 w470 h20 , 尚未处理的PDF文件
Gui, Add, ListBox, x16 y140 w480 h494 v_PdfListBox , PDFListBox

; Generated using SmartGUI Creator 4.0
Gui, Show, x262 y195 h643 w958,  PDF转换程序测试
Return

【退出】:
GuiClose:
ExitApp

【指定工作目录】:
	FileSelectFolder, OutputVar
	if OutputVar <>
	{
		GuiControl, Text, _edtDir, %OutputVar%
	}
	return

【选择转换程序】:
	FileSelectFile, var_SelectedFile, 3, %A_WorkingDir%, 选择PDF转换程序, 转换程序 (*.exe; *.ahk )

	if var_SelectedFile <>
	{
		GuiControl, Text, _edtExe, %var_SelectedFile%
	}

	return


【开始转换】:
	gui submit, nohide
	if _edtDir =
	{
		msgbox 没有选择工作目录
		return
	}
	ifnotexist %_edtDir%
	{
		msgbox 工作目录不存在!
		return
	}	
	if _edtExe =
	{
		msgbox 没有指定转换程序
		return
	}
	ifnotexist %_edtExe%
	{
		msgbox PDF转换程序不存在!
		return
	}
	g_exe := _edtExe

	SB_SetText("正在查找文件") 
	;; 清空当前listbox中的内容
	guicontrol , , _TxtListBox, |
	guiControl , , _PdfListBox, |
	g_txtCount = 0
	g_pdfCount = 0

	;; 查找TXT和PDF文件
	Loop, %_edtDir%\*.txt
    	g_TxtFileList = %g_TxtFileList%%A_LoopFileName%`n

	Loop, %_edtDir%\*.pdf
    	g_pdfFileList = %g_pdfFileList%%A_LoopFileName%`n		

	SB_SetText("正在分析TXT文件") 
	Loop, parse, g_TxtFileList, `n
	{
		if A_LoopField =  ; Ignore the blank item at the end of the list.
			continue

		var_temp := a_loopfield
		var_temp := strLeft2Sub( var_temp, ".txt" )
		/*
		ifinstring var_temp, $
		{
			var_temp := strRight2sub( var_temp, "$" )
			if var_temp =
				continue
		}
		*/
		guicontrol , , _TxtListBox, %a_loopfield%		
		g_txtCount++
		arr_txt%g_txtCount% := var_temp
	}
	SB_SetText("正在显示未转换的PDF文件!") 
	Loop, parse, g_PdfFileList, `n
	{
		if A_LoopField =  ; Ignore the blank item at the end of the list.
			continue

		var_temp := a_loopfield
		var_temp := strLeft2Sub( var_temp, ".pdf" )
		
		;; 检查是否已经转换成TXT文件
		/*
		bDeal := false
		loop %g_txtCount%
		{
			txtname := arr_txt%a_index%
			if ( var_temp == txtname )
			{
				bDeal := true
				break
			}
		}
		*/

		;; 如果该PDF已经转换完成,则跳过之
		if bDeal
			continue

		guicontrol , , _PdfListBox, %var_temp%		
		g_pdfCount++
		arr_pdf%g_pdfCount% := var_temp
	}	
	gosub 【正在转换文件】
	sleep 60000
	goto 【开始转换】
	return

【正在转换文件】:
	ifnotexist %g_exe%
	{
		msgbox 转换程序不存在!
		return
	}
	loop
	{
		if _Checked
		{
            sleep 3000                              ;; 已经被用户暂停
			continue
		}


		if g_pdfCount <= 0
			break
		var_PdfFileName := arr_pdf%g_pdfCount%
		var_PdfFile = %_edtDir%\%var_PdfFileName%.PDF
		var_txtFile = %_edtDir%\%var_PdfFileName%.txt
		var_tip = 正在转换 %var_PdfFile% ......
		SB_SetText( var_tip )
		Run, %comspec% /c %g_exe%   -layout -enc GBK   %var_PdfFile% %var_txtFile%
		;; 等待转换完成
		loop 
		{
			sleep 1000
			ifexist, %var_txtFile%
			{
				;; 文本文件如果只读,可能是还没有转换完,再等1秒
				FileGetAttrib, Attributes, %var_txtFile%
				IfInString, Attributes, R
					continue			
					
				var_datetime = %a_yyyy%%a_mm%%a_dd% %a_hour%_%a_min%_%a_sec%
				var_newfile = %_edtDir%\%var_PdfFileName% %var_datetime%.txt
				FileMove, %var_txtFile%, %var_newfile%, 1
				if ErrorLevel 
				{
					var_tip = : ( 重命名失败! %var_PdfFileName%  
					guicontrol , , _TxtListBox, %var_PdfFileName%.txt
				}
				else
				{
					var_tip = : ) 转换完成 %var_PdfFileName%  
					guicontrol , , _TxtListBox, %var_PdfFileName% %var_datetime%.txt
				}
				break 	
			}
		}
		SB_SetText( var_tip )

		arr_pdf%g_pdfCount% = 
		g_pdfCount--
		gosub 【刷新PDF列表】
	}
	return

【刷新PDF列表】:
	guiControl , , _PdfListBox, |
	loop %g_pdfCount%
	{
		var_temp := arr_pdf%a_index%
		guicontrol , , _PdfListBox, %var_temp%	
	}
	return

【暂停复选框】:
	gui submit, nohide
	if _Checked
	{
		GuiControl, Text, _Checked, 已暂停
	}
	else
	{
		GuiControl, Text, _Checked, 暂停
	}
	return


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
StrLeft2Sub(varString,subString)
{
	StringGetPos, varPos, varString, %subString%
	if  errorlevel
	{ 
		return ""
	}
	stringleft varReturn,varString,%varPos%
	return %varReturn%
;举例..............................
;MyStringVar = squeezer::imageset
;Var := StrLeft2Sub(MyStringVar,"::")
;var 的值为squeezer
}

StrMid2Sub(varString,subString1,subString2)
{
	varTemp = 
	StringGetPos, varPos, varString, %subString1%
	if ErrorLevel 
	{
		return varTemp
	}
	varLen := strlen(subString1)
	varTemp := substr(varString,varPos+varLen+1)

	ifinstring varTemp,%subString2%
	{
		varTemp := StrLeft2Sub(varTemp,subString2)
	}
	else
	{
		varTemp =
		return varTemp
	}
;	msgbox varTemp=%varTemp%
	return varTemp
;举例............................
;str = [sec]
;var := StrMid2Sub(str, "[", "]")
;var 的值为 sec
}

;;--在varString中,从LR指定的方向搜索subString,并返回subString右边的字符串--
StrRight2Sub(varString,subString, LR="R1")
{
	StringGetPos varPos, varString, %subString%, %LR%
	stringleft varTemp,varString,%varPos%
	varLen := strlen(varTemp)
	varLen := strlen(varString) - varLen - strlen(subString)
	stringright varReturn,varString,%varLen%
	return %varReturn%
;举例..............................
;MyStringVar = squeezer::imageset
;fileExtVar = StrRight2Sub(MyStringVar,"::")
;fileExtVar的值为"imageset"
}

Ernestas
Posts: 33
Joined: 16 Apr 2018, 02:07

Re: searching for a string in pdf file

03 Sep 2018, 02:34

I think I have figuered it out

!+w::
pwb := WBGet()
clipboard =
send, ^a
sleep, 150
send, ^c
sleep, 150
var_pdf := clipboard
text := []
loop, parse, var_pdf, %A_space%
text.insert(A_loopfield)
loop, % text.length()
{
if RegExMatch(text[A_index], "Fødselsnummer")
{
text[A_index+1] := Trim(text[A_index+1], "`s")
var_fodelse_nr := RegExReplace(text[A_index+1], "[a-zA-Z]", "")
msgbox, % var_fodelse_nr
break
}
}

return
garry
Posts: 3795
Joined: 22 Dec 2013, 12:50

Re: searching for a string in pdf file

03 Sep 2018, 04:07

@Ernestas , thank you , good idea , don't need external program
example in Scripts&Functions :
https://autohotkey.com/boards/viewtopic.php?f=6&t=55089
;- Drag&Drop a pdf file / PDFtoTEXT PDF>TXT PDF2TXT / Search for text ---------

Code: Select all

note if you search a word  , see end of line problem :
-------------------------------------------
følgerne af    ( ascii 13,10 )
fejl

;- 
at infor-     (ascii 45,13,10 )
mationen
---------------------------------------

Code: Select all

#warn
#singleinstance,force
e:=""
search1:="seis oere thús"
search2:="seis oere thus"

;search:="følgerne af"
return

!w::
clipboard =
send, ^a
sleep, 450
send, ^c
clipwait,8
cl:=clipboard

;msgbox,%cl%
;if cl<>
;  fileappend,%cl%,test55.txt
;return

i=0
loop, parse,cl,`n,`r
{
i++
i:=SubStr(000 i, -3) 
alf:= a_loopfield
if alf contains %search1%,%search2%
  e .= "Line-" . i . "= " . alf . "`r`n"
}
msgbox,%e%
return
esc::exitapp

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Bing [Bot] and 364 guests