Page 1 of 1

[SOLVED] 15,000 Word Docs Searched in 6 mins with OutPut :D

Posted: 28 May 2015, 13:37
by ahklearner
This was my Question:

Hi Community,

I have 100s of word documents (.doc [2003 version]), i need to search each of them in loop for a particular word, and collect that line in a word document.

Thanks in advance for your help and support. :clap:

This is my ANSWER:

Code: Select all

#persistent
#SingleInstance Force
SetBatchLines, -1

FolderToSearchFor = c:\abc
OutPutFolder = c:\def
NeedleContent = YOURSTRING
needle = %NeedleContent%
FileRecycle, %OutPutFolder%\FileName.txt
Loop, %FolderToSearchFor%\*.doc, , 1
{
    Loop, read, %A_LoopFileFullPath%
    {
        If A_LoopReadLine contains ><
        continue
        If A_LoopReadLine contains %needle%
        {
        FileAppend, %A_LoopReadLine%`n, %OutPutFolder%\FileName.txt
        }
    }
}

Re: Search 100s of Word(.doc) for particular word and FileAppend

Posted: 28 May 2015, 14:45
by kon
Try this:

Code: Select all

wdApp := ComObjCreate("Word.Application")
wdApp.Visible := True

; Search options and constants
vbTrue := -1
wdDoNotSaveChanges := 0
wdLine := 5
wdMove := 0
wdReplaceNone := 0

FindText := "Abc123"	; The word to search for
MatchCase := false
MatchWholeWord := vbTrue
Wrap := false
Replace := wdReplaceNone

Result := ""
Loop, %A_DeskTop%\*.docx	; The directory to look in for .docx files
{
	wdDoc := wdApp.Documents.Open(A_LoopFileLongPath)
	wdFind := wdApp.Selection.Find
	wdFind.ClearFormatting
	wdFind.Replacement.ClearFormatting
	
	; Execute the search - Find.Execute Method (Word)
	;	https://msdn.microsoft.com/en-us/library/office/ff193977.aspx
	while wdFind.Execute(FindText, MatchCase, MatchWholeWord,,,,, Wrap,,, Replace) {
		wdApp.Selection.Expand(wdLine)				; Extend the selection to the whole line
		Result .= wdApp.Selection.Text "`r`n"		; Store the line text
		
		; Move the selection to the end of the line so the same line is not found by the next search.
		wdApp.Selection.EndKey(wdLine, wdMove)		
	}
	wdDoc.Close(wdDoNotSaveChanges)
}
wdApp.Quit(wdDoNotSaveChanges)
MsgBox, %Result%
return
Hope it helps :)

Re: Search 100s of Word(.doc) for particular word and FileAppend

Posted: 29 May 2015, 01:30
by ahklearner
kon wrote:Try this:

Code: Select all

wdApp := ComObjCreate("Word.Application")
wdApp.Visible := True

; Search options and constants
vbTrue := -1
wdDoNotSaveChanges := 0
wdLine := 5
wdMove := 0
wdReplaceNone := 0

FindText := "Abc123"	; The word to search for
MatchCase := false
MatchWholeWord := vbTrue
Wrap := false
Replace := wdReplaceNone

Result := ""
Loop, %A_DeskTop%\*.docx	; The directory to look in for .docx files
{
	wdDoc := wdApp.Documents.Open(A_LoopFileLongPath)
	wdFind := wdApp.Selection.Find
	wdFind.ClearFormatting
	wdFind.Replacement.ClearFormatting
	
	; Execute the search - Find.Execute Method (Word)
	;	https://msdn.microsoft.com/en-us/library/office/ff193977.aspx
	while wdFind.Execute(FindText, MatchCase, MatchWholeWord,,,,, Wrap,,, Replace) {
		wdApp.Selection.Expand(wdLine)				; Extend the selection to the whole line
		Result .= wdApp.Selection.Text "`r`n"		; Store the line text
		
		; Move the selection to the end of the line so the same line is not found by the next search.
		wdApp.Selection.EndKey(wdLine, wdMove)		
	}
	wdDoc.Close(wdDoNotSaveChanges)
}
wdApp.Quit(wdDoNotSaveChanges)
MsgBox, %Result%
return
Hope it helps :)
Thanks a bunch for your valuable time spent :salute:
I was wondering if this could be done in BackEnd, without opening word documents visibly.

Re: Search 100s of Word(.doc) for particular word and FileAppend

Posted: 29 May 2015, 09:50
by gilliduck
wdApp.Visible := False ??

Re: Search 100s of Word(.doc) for particular word and FileAppend

Posted: 30 May 2015, 01:10
by ahklearner
Thanks a lot, i have some docs password protected and some corrupt docs, and for every one error is showing, I want to ignore error and just want final output.

What should i write?

thanks!

Re: Search 100s of Word(.doc) for particular word and FileAppend

Posted: 30 May 2015, 01:16
by ahklearner
error.jpg
error.jpg (31.55 KiB) Viewed 4005 times
one more thing if the file is open in some other system (Docs located in Shared Drive) , this error is coming.

I need to bypass all the errors and get final output.

Thanks a lot in advance!

Re: 15,000 Word Docs Searched in 6 mins with OutPut :D

Posted: 30 May 2015, 15:56
by gilliduck
Maybe something like

Code: Select all

IfWinExist, File in Use
	ControlClick, Ok, File in use
You'll need to use WindowSpy to get the proper name of the Ok button, but something like this should look for the window, if it's there click ok automatically. You may need to play with the timing. Possibly use WinWait, File in Use, [however many seconds it typically would take to pop up if it's gonna pop up].

Just my unskilled thoughts.

Re: 15,000 Word Docs Searched in 6 mins with OutPut :D

Posted: 30 May 2015, 17:04
by jethrow
gilliduck wrote:wdApp.Visible := False ??
Note that the Application object is not Visible by default. Just remove the wdApp.Visible := True line.
ahklearner wrote:one more thing if the file is open in some other system ...

Code: Select all

; ...
Loop, %A_DeskTop%\*.docx	; The directory to look in for .docx files
{
	try wdDoc := wdApp.Documents.Open(A_LoopFileLongPath, , True)
	catch
		continue
; ...

Re: 15,000 Word Docs Searched in 6 mins with OutPut :D

Posted: 01 Jun 2015, 02:15
by ahklearner
Thanks gilliduck :clap:
gilliduck wrote:Maybe something like

Code: Select all

IfWinExist, File in Use
	ControlClick, Ok, File in use
You'll need to use WindowSpy to get the proper name of the Ok button, but something like this should look for the window, if it's there click ok automatically. You may need to play with the timing. Possibly use WinWait, File in Use, [however many seconds it typically would take to pop up if it's gonna pop up].

Just my unskilled thoughts.

Re: 15,000 Word Docs Searched in 6 mins with OutPut :D

Posted: 01 Jun 2015, 02:15
by ahklearner
Thanks jethrow :clap:
jethrow wrote:
gilliduck wrote:wdApp.Visible := False ??
Note that the Application object is not Visible by default. Just remove the wdApp.Visible := True line.
ahklearner wrote:one more thing if the file is open in some other system ...

Code: Select all

; ...
Loop, %A_DeskTop%\*.docx	; The directory to look in for .docx files
{
	try wdDoc := wdApp.Documents.Open(A_LoopFileLongPath, , True)
	catch
		continue
; ...

Re: 15,000 Word Docs Searched in 6 mins with OutPut :D

Posted: 03 Jun 2015, 07:04
by Guest
Just for reference: I would have used a tool like Antiword - http://www.winfield.demon.nl/ - to convert the DOCs to plain text (if you have Docx there are several perl scripts that can do this, but of course you can use AutoHotkey probably as well) - anyway, once you have plain text you can easily use a grep tool to find the lines - when you have to do this several times for multiple keywords having the text files + grep it will be very very fast. Just my 2cts. Perhaps this is useful for someone when reading this thread in the future.

Re: 15,000 Word Docs Searched in 6 mins with OutPut :D

Posted: 03 Jun 2015, 11:51
by ahklearner
Thanks for the advice.
Guest wrote:Just for reference: I would have used a tool like Antiword - http://www.winfield.demon.nl/ - to convert the DOCs to plain text (if you have Docx there are several perl scripts that can do this, but of course you can use AutoHotkey probably as well) - anyway, once you have plain text you can easily use a grep tool to find the lines - when you have to do this several times for multiple keywords having the text files + grep it will be very very fast. Just my 2cts. Perhaps this is useful for someone when reading this thread in the future.