AutoHotkey Community

It is currently May 24th, 2012, 2:02 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 7 posts ] 
Author Message
PostPosted: February 25th, 2010, 7:15 pm 
Offline

Joined: April 14th, 2009, 10:40 am
Posts: 182
Hi everybody!
For a "WIP" I coded this function to get words frequencies from a file (*.txt for now) or from a clipboard.

Function:
Code:
/*
Function : GetWordsFrequencies()
Argument : File > GetWordsFrequencies(File)  or   Clipboard > GetWordsFrequencies()
Author: TomXIII
*/

If 0 = 1 ;If a file is directly passed
{
   File = %1%
   GetWordsFrequencies(File)
}

GetWordsFrequencies(InFile="") ;Main function
{
   Text := GetText(InFile)
   Text := ListWords(Text)
   WordsList := SetWordsList(Text)
   UseWordsList(WordsList)
}

;=============================================

GetText(InFile)
{
   If FileExist(InFile)
   {
      SplitPath, InFile,,, FileExt
      If (FileExt<>"txt") ;Poor convert method, doesn't work often!
      {
         TempFile = %A_ScriptDir%\temp.txt
         FileRead, TempText, %InFile%
         FileDelete, %TempFile%
         FileAppend, %TempText%, %TempFile%
         FileRead, TempText, %TempFile%
      }
      Else
         FileRead, TempText, %InFile%
   }
   Else
      TempText := Clipboard
   Return, %TempText%
}

ListWords(TempText) ;Extract all words and separate them with "-"
{
   TempText := RegexReplace(TempText, "(`r|`n|`r`n)", " ")
   TempText := RegexReplace(TempText, "i)[^[:alpha:]éÉèÈàÀùÙâÂêÊîÎôÔûÛïÏëËüÜçÇ'æÆœŒ ]?", "") ; Special chars for French words
   TempText := RegexReplace(TempText, "[ ]+", "-")
   Return, %TempText%
}

SetWordsList(TempText)
{
   StringSplit, TempWord, TempText, -
   TempText = -%TempText%-
   Loop, %TempWord0%
   {
      CurrentTempWord := TempWord%A_Index%
      RegEx = i)-%CurrentTempWord%-
      TempText := RegexReplace(TempText, RegEx, "-", RplCount)
      If (RplCount and CurrentTempWord)
      {
         StrLen := StrLen(RplCount)
         Loop,% 3-StrLen
            RplCount := "0" . RplCount
         CurrentLine := RplCount . "`t   |`t" . CurrentTempWord
         TempList := TempList ? TempList . "`n" . CurrentLine : "Frequency`tWord`n____________________`n" . CurrentLine
      }
   }
   Sort, TempList, D`n R
   Return, %TempList%
}

UseWordsList(TempWordsList)
{
   OutFile = %A_ScriptDir%\Words_Frequencies.txt
   FileDelete, %OutFile%
   FileAppend, %TempWordsList%, %OutFile%
   Run, %OutFile%,,UseErrorLevel
   If ErrorLevel
      MsgBox, Error occured!
}


Sample:
Code:
#Include %A_ScriptDir%\WordFrequencyCounter.ahk
File = %A_ScriptFullPath%
GetWordsFrequencies()


Last edited by TomXIII on February 26th, 2010, 11:28 pm, edited 3 times in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 25th, 2010, 9:25 pm 
Offline

Joined: November 7th, 2006, 9:47 pm
Posts: 1933
Location: Germany
Hi, a while back I wrote a similiar function for counting words.
:arrow: wc - counts words in a string
(The archive contains an example script.)

I did compared the two functions. Count of the word "TempText" in your script shows "12" counted instances, whereas my script counts "18".
But if I count all instances of the word by hand, I came up to "19". :shock:
Mine is closer. :P

(My script returns an double colon and semicolon delimited list, instead showing a Gui.)

_________________
{1:"ahkstdlib", 2:"my libs", 3:"my apps", 4:"my license"}
--> Don't feed the troll! <--


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 26th, 2010, 12:12 am 
Offline

Joined: April 14th, 2009, 10:40 am
Posts: 182
Hi Tuncay!

At the beginning, my script was coded for French words.
May be the main issue is in the regex which doesn't recognises enough what are "words"!
I done this script quickly so it needs several improvments.

PS: I coded this function for a project about Hotstrings. I already done many script with "Loop Input" method and I don't like this method. Built-in hotstrings methods are more efficient but featureless. I wanna do a great program because I use my computer all day to type my lessons or anything else at university.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 26th, 2010, 2:09 am 
Offline

Joined: November 7th, 2006, 9:47 pm
Posts: 1933
Location: Germany
Hey TomXIII!

I have looked a bit more and saw that your script counted "RegexReplaceTempText". But there is no such word, it must be replacing special characters with no character. But you have some "RegexReplace(TempText" in the script.

That line must be the problem maker:
Quote:
Code:
TempText := RegexReplace(TempText, "i)[^[:alpha:]éÉèÈàÀùÙâÂêÊîÎôÔûÛïÏëËüÜçÇ'æÆœŒ ]?", "") ; Special chars for French words


May be you could add a parameter to define min length of word characters to recognize. In example my script does not count words with 2 or less characters.

Hope you do see this as a positive critism.

_________________
{1:"ahkstdlib", 2:"my libs", 3:"my apps", 4:"my license"}
--> Don't feed the troll! <--


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 26th, 2010, 11:28 am 
Offline

Joined: April 14th, 2009, 10:40 am
Posts: 182
Hello Tuncay!

This script was a "first flow". I posted it on the forum to have some comments or improvments by others but after I stopped to code (I gone to the cinema to watch 'Shutter Island'). This afternoon, I will be more free so I will try new solutions.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 26th, 2010, 7:15 pm 
Offline

Joined: April 14th, 2009, 10:40 am
Posts: 182
UPDATED!
Code:
File = %A_ScriptFullPath%
FileRead, TempText, %File%
;~ TempText := RegexReplace( TempText, "(`r|`n|`r`n)", "-")
TempText := RegexReplace( TempText, "i)[^[:alpha:]éÉèÈàÀùÙâÂêÊîÎôÔûÛïÏëËüÜçÇ'æÆœŒ]+", ">`n<") ; éÉèÈàÀùÙâÂêÊîÎôÔûÛïÏëËüÜçÇ'æÆœŒ  Special chars for French words
TempText := RegexReplace( TempText, "[ ]+", "`n")
TempText := RegExReplace( TempText, "(^>`n*)|(`n<*$)", "")
LastText := TempText
Sort, TempText, D`n U
StringSplit, Word, TempText, `n
Loop, %Word0%
{
   CurrentWord := Word%A_Index%
   RegEx = i)%CurrentWord%
   RplCount := 0
   LastText := RegExReplace( LastText, RegEx, "", RplCount)
   StrLen := StrLen(RplCount)
   Loop,% 3-StrLen
      RplCount := "0" . RplCount
   CurrentWord := RegExReplace( CurrentWord, "(^<*)|(>*$)", "")
   CurrentLine := RplCount . "`t   |`t" . CurrentWord
   TempList := TempList ? TempList . "`n" . CurrentLine : CurrentLine
}
Sort, TempList, D`n R
TempList := "Frequency`tWord`n____________________`n" . TempList
OutFile = %A_ScriptDir%\Words_Frequencies.txt
FileDelete, %OutFile%
FileAppend, %TempList%, %OutFile%
Run, %OutFile%,,UseErrorLevel
If ErrorLevel
   MsgBox, Error occured!!!!!!!!


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 26th, 2010, 11:24 pm 
Offline

Joined: April 14th, 2009, 10:40 am
Posts: 182
I added two filters:
- MinLenght
- MinFrequency

Code:
File = %A_ScriptFullPath%

GetWordsFrequencies(File)

GetWordsFrequencies(File="", MinLenght=0, MinFrequency=0)
{
   If File
      FileRead, TempText, %File%
   Else
      TempText := Clipboard
   TempText := RegexReplace( TempText, "i)[^[:alpha:]éÉèÈàÀùÙâÂêÊîÎôÔûÛïÏëËüÜçÇ'æÆœŒ]+", ">`n<")
   TempText := RegexReplace( TempText, "[ ]+", "`n")
   TempText := RegExReplace( TempText, "(^>`n*)|(`n<*$)", "")
   TempList := CreateWordsList(TempText, MinLenght, MinFrequency)
   UseWordsList(TempList)
}

CreateWordsList(TempText, MinLenght, MinFrequency)
{
   LastText := TempText
   Sort, TempText, D`n U
   StringSplit, Word, TempText, `n
   Loop, %Word0%
   {
      CurrentWord := Word%A_Index%
      RegEx = i)%CurrentWord%
      RplCount := 0
      LastText := RegExReplace( LastText, RegEx, "", RplCount)
      StrLen := StrLen(RplCount)
      Loop,% 3-StrLen
         RplCount := "0" . RplCount
      CurrentWord := RegExReplace( CurrentWord, "(^<*)|(>*$)", "")
      CurrentLine := RplCount . "`t   |`t" . CurrentWord
      Lenght := StrLen(CurrentWord)
      If (Lenght>=MinLenght and RplCount>=MinFrequency)
         TempList := TempList ? TempList . "`n" . CurrentLine : CurrentLine
   }
   Sort, TempList, D`n R
   TempList := "Frequency`tWord`n____________________`n" . TempList
   Return, %TempList%
}

UseWordsList(TempList)
{
   OutFile = %A_ScriptDir%\Words_Frequencies.txt
   FileDelete, %OutFile%
   FileAppend, %TempList%, %OutFile%
   Run, %OutFile%,,UseErrorLevel
   If ErrorLevel
      MsgBox, Error occured!!!!!!!!
}


Quote:
Hope you do see this as a positive critism.

1) You leave posts
2) You read my code
3) You report me problems
So yes I see this as a positive critism! I've no problems with this kind of posts!


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Exabot [Bot], RaptorX, tomoe_uehara and 17 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group