Page 1 of 1

VBScript.Regex and highlighting text

Posted: 05 Apr 2019, 09:49
by JustAnotherAHKUser
Hi guys,

I'm trying to use vbscript.regex to highlight text in Word documents. It seems to work fine, though it doesn't highlight the text properly if there's a table in the file. Then the highlighting shifts.

Does anyone know how to fix that? Thanks for the help!

Code: Select all

needleArray := []
oWord := ComObjCreate("Word.Application")
regex := ComObjCreate("VBScript.RegExp")
regex.IgnoreCase := true

loop, read, %A_ScriptDir%\test.txt
{
	loop, parse, A_LoopReadLine, `n
	{
		needle := needleArray.Push(A_LoopReadLine)
	}
}

sourceFile := "test with table.docx"
filePath = %A_ScriptDir%\
sourceFullPath :=  filePath . sourceFile
SplitPath, sourceFullPath, OutFileName, OutDir, OutExtension, OutNameNoExt, OutDrive
oWord.Documents.Open(sourceFullPath)
oWord.Visible := 1
oWord.Activate
haystack := oWord.ActiveDocument.Range.Text

for index, needle in needleArray {
	regex.Pattern := ""
	regex.Pattern := needle

	regexMatch := regex.Execute(haystack)
	for item in regexMatch {
		oWord.ActiveDocument.Range(item.FirstIndex, item.FirstIndex + item.Length).HighlightColorIndex := 4
	}
}
oWord.Application.ActiveDocument.SaveAs(outDir . "\" . OutNameNoExt . "_xxx." . OutExtension)
oWord.Application.ActiveDocument.Close()
oWord.Application.Quit
sleep 100
oWord := ""
sleep 100
regex := ""
MsgBox,, Oh..., Done, done!
Exitapp
Regex patterns:
Aen(.*?)n nec l(.*?)m
Suspendisse dui purus, (.*?), nunc


Result w/o table is here:
https://imgur.com/a/4bgrLSz

Result w table:
https://imgur.com/a/01Upkwf

Re: VBScript.Regex and highlighting text

Posted: 05 Apr 2019, 12:05
by Klarion
interesting
i have never seen this 'error'
it looks like sum up each cell count ahead of 'it'
-so, first '4' and the next '5'
-each cell has Chr(7) at the end of it. I guess, this is why but, I am not sure about it.

if you really had hard time and nobody helped you..
how about try native search style

i mean range.find method
it works almost same as common RegExp
-though a little bit different

Good Luck To you

Re: VBScript.Regex and highlighting text

Posted: 05 Apr 2019, 14:51
by sinkfaze
Tables have a lot of "junk" that you encounter using VBA that you don't see on your screen. You could iterate the table, pull each value out of each cell and evaluate/highlight that way, but as far as using Range itself there's not much that you can do to get around the issues.

Re: VBScript.Regex and highlighting text  Topic is solved

Posted: 05 Apr 2019, 15:13
by FanaticGuru
JustAnotherAHKUser wrote:
05 Apr 2019, 09:49
I'm trying to use vbscript.regex to highlight text in Word documents. It seems to work fine, though it doesn't highlight the text properly if there's a table in the file. Then the highlighting shifts.

Like sinkfaze said. You cannot simple take all the text of a Word document, put it in a string, run RegEx on the string, and then expect the positions found in the string to match up with the position in the actual Word document. There are lots of formatting that Word sees that is lost in a plain string. Pretty much any HTML type formatting will mess it up. Tables are a type of HTML formatting.

One way this could be done is to use the RegEx to get the matches in the string, then use the actual text of the matches to use Find in Word to modify those exact text strings in the document.

Here is an example.

Code: Select all

wdApp := ComObjActive("Word.Application")
wdRegEx := ComObjCreate("VBScript.RegExp")

wdRegEx.Pattern := "t.st"
wdRegEx.Global := true
wdRegEx_Matches := wdRegEx.Execute(wdApp.ActiveDocument.Range.Text)

wdApp.Options.DefaultHighlightColorIndex := 4 ; Bright Green
wdFind := wdApp.ActiveDocument.Content.Find
for wdRegEx_Match in wdRegEx_Matches
{
	wdFind.ClearFormatting
	wdFind.Replacement.ClearFormatting
	wdFind.Replacement.Highlight := true
	wdFind.Execute(wdRegEx_Match.Value,,,,,,,1,,,2)
}
This will find t.st in the Word document and highlight. So it will find things like "test", "tast", "tost", etc.

This code could be made more efficient by weeding out duplicate Matches from the RegEx as ReplaceAll will get duplicates all in one go.

FG

Re: VBScript.Regex and highlighting text

Posted: 10 Apr 2019, 09:15
by JustAnotherAHKUser
Hi guys,

Thank you very much for your comments and suggestions - really appreciate it!
One way this could be done is to use the RegEx to get the matches in the string, then use the actual text of the matches to use Find in Word to modify those exact text strings in the document.
@FanaticGuru - this is an excellent idea! Been using find.execute for normal text searches - all clear how to use it! Thanks again! :bravo: