create a word frequency counter Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
crypter
Posts: 90
Joined: 15 Dec 2020, 09:57

create a word frequency counter

20 Jul 2021, 18:24

i have this code that, lets you select a file, then match a regex and is supposed to write what words it finds.

this is the regex but could be flawed

Code: Select all

RegExMatch(%WordCount%, "(*UCP)(?is)(\b[[:alpha:]]{2,}\b)(?!.*\b\1\b)", 3)
i want it to open a text file with a lot of text then write in another file the top frequency words list like from high to low words list

Code: Select all

FileSelectFile, SourceFile, 3,, Pick a text file to analyze.
if (SourceFile = "")
    return

SplitPath, SourceFile,, SourceFilePath,, SourceFileNoExt
DestFile := SourceFilePath "\" SourceFileNoExt " frequency.txt"

if FileExist(DestFile)
{
    MsgBox, 4,, Overwrite the existing file? Press No to append to it.`n`nFILE: %DestFile%
    IfMsgBox, Yes
        FileDelete, %DestFile%
}

WordCount := 0
Loop, read, %SourceFile%, %DestFile%
{
    SearchString := A_LoopReadLine
    Gosub, Search
}
MsgBox %WordCount% words were found and written to "%DestFile%".
return


Search:
Start1 := RegExMatch(%WordCount%, "(*UCP)(?is)(\b[[:alpha:]]{2,}\b)(?!.*\b\1\b)", 3)


Start := Start1
Loop
{

    ArrayElement := Start%A_Index%
    if (ArrayElement = "")
        break
    if (ArrayElement = 0)
        continue
    if (Start = 0)
        Start := ArrayElement
    else 
    {
        if (ArrayElement != 0)
            if (ArrayElement < Start)
                Start := ArrayElement
    }
}

if (Start = 0)  
    return


URL := SubStr(SearchString, Start) 
Loop, parse, WRD, %A_Tab%%A_Space%<>
{
    WRD := A_LoopField
    break 
}

StringReplace, Cleansed, WRD, ",, All
FileAppend, %Cleansed%`n
WordCount += 1


CharactersToOmit := StrLen(WRD)
CharactersToOmit += Start
SearchString := SubStr(SearchString, CharactersToOmit)
Gosub, Search
return
it has to calculate how many times the words exist in the text file and compile a list
crypter
Posts: 90
Joined: 15 Dec 2020, 09:57

Re: create a word frequency counter

20 Jul 2021, 21:05

i found a script about it

it's solved

here:

Code: Select all

FileRead, H, % A_ScriptDir "\tempfile.txt"
;FileDelete,  % A_ScriptDir "\tempfile.txt"

words := []
while pos := RegExMatch(H, "(\b[[:alpha:]]{2,}\b)(?!.*\b\1\b)", m, A_Index=1?1:pos+StrLen(m))
	words[m] := words[m] ? words[m] + 1 : 1
for word, count in words
	list .= count "`t" word "`r`n"
Sort, list, RN
loop, parse, list, `n, `r
{
	result .= A_LoopField "`r`n"
	if A_Index = %word%
		break
}
;MsgBox % "Freq`tWord`n" result
FileAppend, % "Freq`tWord`n" result, C:\Users\lapto\OneDrive\Desktop\New folder (7)\New folder (3)\Test.txt
return
AHKStudent
Posts: 1472
Joined: 05 May 2018, 12:23

Re: create a word frequency counter

20 Jul 2021, 21:37

Code: Select all

h := "i want it to open a text file with a want text lot of text then write in another file the top frequency words list like from high to low words list"
words := []
while pos := RegExMatch(H, "(\b[[:alpha:]]{2,}\b)(?!.*\b\1\b)", m, A_Index=1?1:pos+StrLen(m))
	words[m] := words[m] ? words[m] + 1 : 1
for word, count in words
	list .= count "`t" word "`r`n"
Sort, list, RN
loop, parse, list, `n, `r
{
	result .= A_LoopField "`r`n"
	if A_Index = %word%
		break
}
MsgBox % "Freq`tWord`n" result
Shouldn't words like want have more than 1?
User avatar
Chunjee
Posts: 1408
Joined: 18 Apr 2014, 19:05
Contact:

Re: create a word frequency counter

21 Jul 2021, 16:04

I had fun figuring this one out :D

Code: Select all

A := new biga() ; requires https://www.npmjs.com/package/biga.ahk

sentence := "If I remain careful, I can repeat what I am compelled to do, over and over again. She switched it on and asked Howie to repeat what he'd said about the buildings he recognized."
; find the total number of words in the sentence
totalWords := A.size(A.words(sentence))
; => 35

; find how many times each word appears in the sentence
wordOccurances := A.countBy(A.words(sentence), A.toLower)
; => {"about": 1, "again": 1, "am": 1, "and": 2, "asked": 1, "buildings": 1, "can": 1, "careful": 1, "compelled": 1, "do": 1, "he": 1, "he'd": 1, "howie": 1, "i": 3, "if": 1, "it": 1, "on": 1, "over": 2, "recognized": 1, "remain": 1, "repeat": 2, "said": 1, "she": 1, "switched": 1, "the": 1, "to": 2, "what": 2}
crypter
Posts: 90
Joined: 15 Dec 2020, 09:57

Re: create a word frequency counter

28 Sep 2021, 09:25

i have this script that can create a most frequent word list 1 word per result

Code: Select all

FileRead, H, % A_ScriptDir "\original.txt"
;FileDelete,  % A_ScriptDir "\tempfile.txt"

words := []
while pos := RegExMatch(H, "(\b[[:alpha:]]{2,}\b)(?!.*\b\1\b)", m, A_Index=1?1:pos+StrLen(m))
	words[m] := words[m] ? words[m] + 1 : 1
for word, count in words
	list .= count "`t" word "`r`n"
Sort, list, RN
loop, parse, list, `n, `r
{
	result .= A_LoopField "`r`n"
	if A_Index = %word%
		break
}
;MsgBox % "Freq`tWord`n" result
FileAppend, % "Freq`tWord`n" result, C:\Users\lapto\OneDrive\Desktop\original_list.txt
return
how can i make it list frequent group of 2 or 3 words instead of 1 word

like, list most frequent 3 words instead of 1 word per match
User avatar
Chunjee
Posts: 1408
Joined: 18 Apr 2014, 19:05
Contact:

Re: create a word frequency counter

28 Sep 2021, 15:54

crypter wrote:
28 Sep 2021, 09:25
list most frequent 3 words instead of 1 word per match

Code: Select all

A := new biga() ; requires https://www.npmjs.com/package/biga.ahk

sentence := "If I remain careful, I can repeat what I am compelled to do, over and over again. She switched it on and asked Howie to repeat what he'd said about the buildings he recognized."
; find how many times each word appears in the sentence
wordOccurances := A.countBy(A.words(sentence), A.toLower)
; => {"about": 1, "again": 1, "am": 1, "and": 2, "asked": 1, "buildings": 1, "can": 1, "careful": 1, "compelled": 1, "do": 1, "he": 1, "he'd": 1, "howie": 1, "i": 3, "if": 1, "it": 1, "on": 1, "over": 2, "recognized": 1, "remain": 1, "repeat": 2, "said": 1, "she": 1, "switched": 1, "the": 1, "to": 2, "what": 2}

; move key value into own value ({"about": 1} => ["aboute", 1]) so that we may sort without loosing the string
wordOccurances := A.toPairs(wordOccurances)
; sort by occurance and reverse so highest occurances are first
sortedOccurences := A.reverse(A.sortBy(wordOccurances, 2))
; take top three occurences
topThreeWords := A.take(sortedOccurences, 3)
; => [["i", 3], ["to", 2], ["repeat", 2]]

; map first array values only
topThreeWords := A.map(topThreeWords, A.first)
; => ["i", "to", "repeat"]
User avatar
flyingDman
Posts: 2798
Joined: 29 Sep 2013, 19:01

Re: create a word frequency counter

28 Sep 2021, 17:13

Try:

Code: Select all

sentence = 
(
If I remain careful, I can repeat what I am compelled to do, 
over and over again!. (Would I?) [What?]. She switched it on 
and asked Howie to repeat what he'd said about the buildings 
he recognized.
)

arr := {}, lst := []
for x,y in strsplit(regexreplace(sentence, "[,\.\?!;\(\)\[\]\{\}\v]")," ")
	if (y and !ObjHasKey(arr, y))
		arr[y] := 1, lst.push(y)
	else
		arr[y]++

for k,l in lst
	result .= arr[l] "`t" l "`n" 
sort, result, NR
msgbox % substr(result,1,instr(result,"`n",,1,3))

14.3 & 1.3.7
crypter
Posts: 90
Joined: 15 Dec 2020, 09:57

Re: create a word frequency counter

29 Sep 2021, 12:19

is it possible to just add it to the script i posted?
User avatar
flyingDman
Posts: 2798
Joined: 29 Sep 2013, 19:01

Re: create a word frequency counter

29 Sep 2021, 12:53

Use fileread to assign a variable to the content of your file (instead of sentence = ....) and fileappend to save the result to a file. Other than taht you should be set.
14.3 & 1.3.7
crypter
Posts: 90
Joined: 15 Dec 2020, 09:57

Re: create a word frequency counter

30 Sep 2021, 09:37

doesn't count three words, but one instead

here is an example of before and after
Attachments
Screenshot_11.png
Screenshot_11.png (25.76 KiB) Viewed 1545 times
User avatar
flyingDman
Posts: 2798
Joined: 29 Sep 2013, 19:01

Re: create a word frequency counter

30 Sep 2021, 09:49

I misunderstood your request. You meant the frequency of a phrase of 3 words rather than the top most frequent words. I believe what you asking for is much more complicated.
14.3 & 1.3.7
User avatar
Chunjee
Posts: 1408
Joined: 18 Apr 2014, 19:05
Contact:

Re: create a word frequency counter

30 Sep 2021, 13:41

is it possible to post tempfile.txt or whatever file is being tested with?

I came up with this:

Code: Select all

A := new biga() ; requires https://www.npmjs.com/package/biga.ahk

nmbr := 5
grams := 3
arr := []
var =
(join`r`n
It was the best of times, it was the worst of times, it was the age
of wisdom, it was the age of foolishness, it was the epoch of
belief, it was the epoch of incredulity, it was the season of
Light, it was the season of Darkness, it was the spring of hope,
it was the winter of despair, we had everything before us,
we had nothing before us, we were all going direct to Heaven, we
were all going direct the other way-in short, the period was so
far like the present period, that some of its noisiest authorities
insisted on its being received, for good or for evil, in the
superlative degree of comparison only.
)

; count the ngrams most common ngrams and
topSequences := A.take(A.reverse(A.sortBy(A.toPairs(A.countBy(fn_generatengrams(A.words(var), grams), A.toLower)), 2)), nmbr)
; => [["it was the", 10], ["of times it", 2], ["before us we", 2], ["were all going", 2], ["we were all", 2]]

; FUNCTIONS
fn_generatengrams(param_array, param_groupsize) {
	if (biga.isString(param_array)) {
		param_array = biga.words(param_array)
	}
	array := param_array
	ngrams := []
	loop, % array.count() {
		ngrams.push(biga.join(biga.slice(param_array, A_Index, A_Index + param_groupsize - 1), " "))
	}
	return ngrams
}
Last edited by Chunjee on 05 Oct 2021, 14:48, edited 1 time in total.
User avatar
flyingDman
Posts: 2798
Joined: 29 Sep 2013, 19:01

Re: create a word frequency counter

30 Sep 2021, 16:40

try this:

Code: Select all

sentence = 
(
This is a long paragraph with many repeating words. This is intended, so that the groups of repeating words can be counted. So that the script can identifiy the many repeating words, I have written this long paragraph with many repeating words.
)
arr := {}, lst := [], arr2 := []

for a,b in arr1:=strsplit(regexreplace(sentence, "[,\.\?!;\(\)\[\]\{\}\v]")," ")
	{
	arr2.push(arr1[a] " " arr1[a+1] " " arr1[a+2])
	arr2.push(arr1[a] " " arr1[a+1])
	}
for x,y in arr2
	if (y and !ObjHasKey(arr, y))
		arr[y] := 1, lst.push(y)
	else
		arr[y]++

for k,l in lst
	result .= arr[l] "`t" l "`n" 
sort, result, NR
msgbox % substr(result,1,instr(result,"`n",,1,10))
It works on both groups of 2 and 3 words and can be changed for just 2 or just 3. Someone else might have a better regex to eliminate the punctuation.
14.3 & 1.3.7
crypter
Posts: 90
Joined: 15 Dec 2020, 09:57

Re: create a word frequency counter

30 Sep 2021, 19:10

it works for 2

how to make it write 3
User avatar
flyingDman
Posts: 2798
Joined: 29 Sep 2013, 19:01

Re: create a word frequency counter  Topic is solved

30 Sep 2021, 19:33

20210930_172347.jpg
20210930_172347.jpg (16.64 KiB) Viewed 1444 times
I see several groups of 3...

for only groups of 3:

Code: Select all

for a,b in arr1:=strsplit(regexreplace(sentence, "[,\.\?!;\(\)\[\]\{\}\v]")," ")
	{
	arr2.push(arr1[a] " " arr1[a+1] " " arr1[a+2])
	;~ arr2.push(arr1[a] " " arr1[a+1])
	}
for only groups of 2

Code: Select all

for a,b in arr1:=strsplit(regexreplace(sentence, "[,\.\?!;\(\)\[\]\{\}\v]")," ")
	{
	;~ arr2.push(arr1[a] " " arr1[a+1] " " arr1[a+2])
	arr2.push(arr1[a] " " arr1[a+1])
	}
14.3 & 1.3.7
User avatar
flyingDman
Posts: 2798
Joined: 29 Sep 2013, 19:01

Re: create a word frequency counter

03 Oct 2021, 16:13

The code I posted above is not very efficient. So let's reiterate the goal: generate a list of top x (x = nmbr) commonly used strings of words in any text. I think this is a better solution:

Code: Select all

nmbr := 5						; max number of results to return	
var = 
(join`r`n
It was the best of times, it was the worst of times, it was the age 
of wisdom, it was the age of foolishness, it was the epoch of
belief, it was the epoch of incredulity, it was the season of 
Light, it was the season of Darkness, it was the spring of hope,
it was the winter of despair, we had everything before us, 
we had nothing before us, we were all going direct to Heaven, we 
were all going direct the other way-in short, the period was so 
far like the present period, that some of its noisiest authorities
insisted on its being received, for good or for evil, in the 
superlative degree of comparison only.
)
arr:={}
Var := RegExReplace(var,"\s*\W+\s*"," ")
for x,y in z := strsplit(var," ")
	arr[tmp := z[x] " " z[x+1] " " z[x+2]] := !ObjHasKey(arr, tmp) ? 1 : arr[tmp] + 1       ; for groups of 3 words
;	arr[tmp := z[x] " " z[x+1]] := !ObjHasKey(arr, tmp) ? 1 : arr[tmp] + 1                  ; for groups of 2 words
for x,y in arr
	rslt .= y "`t" x "`n"
sort, rslt, NR
nmbr := nmbr > arr.count() ? arr.count() : nmbr
msgbox % substr(rslt,1,instr(rslt,"`n",,1,nmbr))
14.3 & 1.3.7

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Google [Bot], Mindfork, return, tabr3, TarDragoon and 97 guests