Detecting specific word repetition over several words

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
LeFunk
Posts: 86
Joined: 29 Aug 2016, 03:12

Detecting specific word repetition over several words

Post by LeFunk » 02 Dec 2022, 07:00

Hello!

I'm trying to create a script that detects if a word has been repeated.
So that "and, or. And or. And... or?" would be fine,
but: "and, or. And and. Or... or?" would be marked as erroneous.

What makes things harder is that there may be other words between the words "one" and "two":
"And a bear. Or rabbits and one racoon and a deer"
But the script should only check that the words "and" and "or" are used interchangeably.

Can it be done?
I have so far found

Code: Select all

FoundPos := RegExMatch(\b(\w+)\s+\1\b)
But it checks for all word repetitions, and they have to be close.

Thanks in advance!

User avatar
mikeyww
Posts: 26885
Joined: 09 Sep 2014, 18:38

Re: Detecting specific word repetition over several words

Post by mikeyww » 02 Dec 2022, 07:16

Some tips for your adjustment could be to write a detailed plain-language description of what to find: "A word followed by anything that is not a _____ followed by the same word....." or whatever. You can then translate your description into AHK. Check your description against all of your examples.

User avatar
Chunjee
Posts: 1419
Joined: 18 Apr 2014, 19:05
Contact:

Re: Detecting specific word repetition over several words

Post by Chunjee » 02 Dec 2022, 13:06

LeFunk wrote:
02 Dec 2022, 07:00
I'm trying to create a script that detects if a word has been repeated.
What word?



https://biga-ahk.github.io/biga.ahk/#/?id=countby Can work like a frequency counter if feed an array of words; we can do so like:

Code: Select all

A := new biga() ; requires https://github.com/biga-ahk/biga.ahk


inputText := "He felt that his whole life was some kind of dream and he sometimes wondered whose it was and whether they were enjoying it."
array := A.countby(A.words(inputText))
; => {"and": 2, "dream": 1, "enjoying": 1, "felt": 1, "He": 2, "his": 1, "it": 2, "kind": 1, "life": 1, "of": 1, "some": 1, "sometimes": 1, "that": 1, "they": 1, "was": 2, "were": 1, "whether": 1, "whole": 1, "whose": 1, "wondered": 1}


Then just filter or use any with a count higher than one. Or whatever word you are interested in.

Code: Select all

for key, value in array {
	if (value > 1) {
		msgbox, % key
		; => "and"
		; => "He"
		; => "it"
		; => "was"
	}
}

LeFunk
Posts: 86
Joined: 29 Aug 2016, 03:12

Re: Detecting specific word repetition over several words

Post by LeFunk » 03 Dec 2022, 06:31

It's a bit more complicated.

Basically, in my native language it is considered a bad style to use the same conjunctive word repeatedly. For example: "this thing AND that thing AND more things" is not good, it has to be "this thing AND that thing OR more things".

I am trying to write a rule that would detect this error.

User avatar
mikeyww
Posts: 26885
Joined: 09 Sep 2014, 18:38

Re: Detecting specific word repetition over several words

Post by mikeyww » 03 Dec 2022, 08:49

It looks like you want to find these things only within one sentence or clause. A sentence can be tricky to define, so you would probably want to tackle that first. Best of luck!

User avatar
boiler
Posts: 16926
Joined: 21 Dec 2014, 02:44

Re: Detecting specific word repetition over several words

Post by boiler » 03 Dec 2022, 08:50

This doesn’t have anything to do with solving your problem, and I believe you have a true need here, but your example makes me curious about the language and the specific words. As you know, the words “and” and “or” in English have different meanings and are not interchangeable. For example, I’ll repeat that last sentence following your rule: The words “and” and “or” have different meanings or are not interchangeable. Another example: “Lions and tigers or bears! Oh my!”

User avatar
mikeyww
Posts: 26885
Joined: 09 Sep 2014, 18:38

Re: Detecting specific word repetition over several words

Post by mikeyww » 03 Dec 2022, 09:01

I agree fully, and since English has no specified order of precedence, the meaning is also unknown or ambiguous, or both. :)

Punctuation may help and usually helps, or may not help.

I like lions, or tigers and bears.
I like lions or tigers, and bears.

LeFunk
Posts: 86
Joined: 29 Aug 2016, 03:12

Re: Detecting specific word repetition over several words

Post by LeFunk » 03 Dec 2022, 10:03

boiler wrote:
03 Dec 2022, 08:50
your example makes me curious about the language and the specific words.
It's Estonian and there are two words that both mean "and" (exactly, 100% the same meaning): they are "ja" and "ning".

But because there are two words that both mean "and", it is considered a bad style to repeatedly use just one word because it makes the text sound repetitive and tedious: "this AND that AND more", so it is preferred to write "this AND that (ALTERNATIVE AND) more".
I used the AND/OR examples to make it more understandable for other language speakers, but in hindsight I realize it was probably misleading.

I started to build a rule for this with RegExReplace and regex101, but so far it's all been a bit over my head.

User avatar
boiler
Posts: 16926
Joined: 21 Dec 2014, 02:44

Re: Detecting specific word repetition over several words

Post by boiler » 03 Dec 2022, 10:28

Interesting. Thanks for the explanation. :thumbup:

User avatar
Chunjee
Posts: 1419
Joined: 18 Apr 2014, 19:05
Contact:

Re: Detecting specific word repetition over several words

Post by Chunjee » 03 Dec 2022, 13:57

Here is my attempt at a function that checks if a value is being repeated {x} times consecutively.

Code: Select all

fn_checkConsecutiveValues(param_arr, param_maxInARow:=1) {
	consecutiveCounter := 0
	for key, value in param_arr {
		; skip on first element
		if (A_Index == 1) {
			continue
		}
		; perform main idea
		if (param_arr[A_Index - 1] = value) {
			consecutiveCounter++
		} else {
			consecutiveCounter := 0
		}
		; check if too many consecutives
		if (consecutiveCounter >= param_maxInARow) {
			return false
		}
	}
	; param_maxInARow was never exceeded
	return true
}
For flexibility, param_maxInARow can be set to 1 (No consecutive uses); or a different amount as desired.
It returns true when the input array was not too consecutive, else false



For example ["ja", "ja", "ja"] will return false with a "param_maxInARow" setting of two or lower; while ["ja", "ning", "ja", "ning", "ja"] will always return true because no values repeat in a row.


All that's left is to filter all other words.

Code: Select all

A := new biga() ; requires https://github.com/biga-ahk/biga.ahk

userInput := "Ma armastan kooke ja kooke ja kõige rohkem magustoite; Ja ärge unustage pudingut."
filterWordsArr := A.filter(A.words(userInput), func("fn_filterJaNing"))
; => ["ja", "ja", "Ja"]
msgbox, % fn_checkConsecutiveValues(filterWordsArr, 1)
; => false


userInput := "Ma armastan kooke ja kooke ning kõige rohkem magustoite; Ja ärge unustage pudingut."
filterWordsArr := A.filter(A.words(userInput), func("fn_filterJaNing"))
; => ["ja", "ning", "Ja"]
msgbox, % fn_checkConsecutiveValues(filterWordsArr, 1)
; => true



; functions
fn_filterJaNing(param_input) {
	if (param_input = "ja" || param_input = "ning") {
		return true
	}
}

This does not solve any 'while typing' or 'for each sentence' challenge. But I hope it helps

Post Reply

Return to “Ask for Help (v1)”