19 Jun 2018, 15:46

Not the best forum for this request but the other forums didn't seem to right either so here goes...

I'm looking for script that will find repeating words in text. I've seen word processors that that do do it but I did a quick search and I didn't find any scripts out there that do it. I may be using the wrong search terms.

Anyone know of have a script to do this? If not, I will be forced to write my own.

Re: Script to Find Repeating Words

19 Jun 2018, 16:13

You mean like this?

#SingleInstance, Force

SampleText := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit."

StrReplace(SampleText, "Lorem", "Lorem", Occurrence)

MsgBox, % """Lorem"" occurs exactly " Occurrence " times in this sample text."
Re: Script to Find Repeating Words

19 Jun 2018, 17:04

Thank you for your interest.

No. I'm sorry, I realize that I wasn't very specific. When I say repeating words I mean words that are repeated next to each other. "The the" is a common problem that occurs in many of the documents that I write. Connectives like "with" are also common. Although there are common words, any word can be inadvertently (and incorrectly) repeated and I need to find them.
Re: Script to Find Repeating Words

19 Jun 2018, 17:30

sampletext := "The the quick brown fox jumps with with excitement over the lazy dog."

RepeatedTextArray := {}

Loop, Parse, sampletext, %A_Space%
    if (A_LoopField = LastFoundWord)
        RepeatedTextArray[LastFoundWord] := LastFoundWord . A_Space . A_LoopField
    LastFoundWord := A_LoopField

for word, RepeatedWord in RepeatedTextArray
    sampletext := StrReplace(sampletext, RepeatedWord, word)

MsgBox % sampletext
Re: Script to Find Repeating Words

19 Jun 2018, 17:33

- Here's an attempt. It's essentially just a parsing loop using space as the delimiter. You then have to consider how to deal with any non-letters. I have considered commas and full stops in the example below.
- If you want to know the exact position of where the repeated match occurs, you could use my script, and then use RegExMatch on the original text to find the positions.
- RegExReplace could be used to replace secondary occurrences of words, but some are valid.

q:: ;list repeated words
vText := "abc abc, def ghi def def. ghi abc abc def, def"
;note: 'def, def' won't be considered as a repeated pair, since there is a punctuation mark in the middle (such behaviour could be changed by replacing commas with spaces via StrReplace and then replacing multiple spaces with single spaces via RegExReplace)
vText := StrReplace(vText, ",", " ")
vText := StrReplace(vText, ".", " ")
vTempLast := " ", vOutput := "" ;vTemp is compared to vTempLast each time, since we're parsing using space as the delimiter, the previous item will never be a space
Loop, Parse, vText, % " "
	vTemp := A_LoopField
	if !(vTemp = "") && (vTemp = vTempLast)
		vOutput .= vTempLast " " vTemp "`r`n"
	vTempLast := vTemp
Clipboard := vOutput
MsgBox, % vOutput
Re: Script to Find Repeating Words

19 Jun 2018, 17:58

I was hoping that that there was some some ready-to-go script out there but this is a good start. Thanks everybody! :thumbup:
Re: Script to Find Repeating Words

19 Jun 2018, 19:26

From the "just in case you care" department...

I found a good RegEx pattern to help with this requirement from this this web site. It is far (far) from being done but this is what I have so far.

#SingleInstance Force

    This is just a test        Test this
    This this is is
    just just a test Test of the
    the emergency broadcast center center.

    FoundPos:= RegExMatch(Text,"i)\b([\w]+)\s+\1\b",FoundString,StartPos)
    if FoundPos then
        MsgBox Found: %FoundString%
        MsgBox Nothing (else) found.


Re: Script to Find Repeating Words

20 Jun 2018, 06:07

jballi wrote:this this
we see what you did there... :D
Re: Script to Find Repeating Words

20 Jun 2018, 07:12

not the whole shebang but might give u a starting point

#SingleInstance Force
SetBatchLines -1

SampleText =
		Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor sit amet, consectetur
		adipiscing elit. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
		Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
		incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
		exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure
		dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
		Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
		mollit anim id est laborum. Ut enim ad minim veniam, quis nostrud
		exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure
		dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
		Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
		mollit anim id est laborum.

Result := findAllRepeatingWords(SampleText)

Gui Display: New, +AlwaysOnTop, Results
Gui Display: Margin, 4, 4
for word, timesRepeated in Result
	idx := A_Index
	res .= Format("{}: {} ({})`n", (idx < 10 ? "0" idx : idx), word, timesRepeated)

Gui Display: Add, Edit, w200, % res
Gui Display: Show, xCenter yCenter


findAllRepeatingWords(str) {
	str := RegExReplace(str, "\W", A_Space) ; replace non-word chars with space(get rid of punctuation). what about apostrophes?

	Words := StrSplit(str, A_Space)

	Reps := countRepetitions(Words)
	return pruneRepetitions(Reps)

; return words mapped to their occurences
countRepetitions(Arr) {
	Result := {}
	for each, word in Arr
		if (word != "")
			if (Result.HasKey(word))
				Result[word] := 1

	return Result

; get rid of single occurence words
pruneRepetitions(Arr) {
	Result := {}
	for word, timesRepeated in Arr
		if (timesRepeated != 1)
			Result[word] := timesRepeated

	return Result
Re: Script to Find Repeating Words

20 Jun 2018, 14:41

swagfag: Definitely not what I was looking for but your script provides information about a document that might be useful in the future. Thanks for sharing. :)

