hugov wrote:
Rather than hasing each word individually character by character why not hash the entire wordlist in one go, here is some code to get you started
and take it from there
Really the only reason I convert to ASCII codes is because some characters cannot be part of a variable name. Looking at your code I would have to search the string for every possible character that could be used in a word.
Also, the performance on that is questionable (I guess it depends on how efficient ConvertWordToAscii is). StringReplace would (I assume) use a linear search to find all matches. There are 52 characters in your list, which means 52 separate linear searches.
Given a 100,000 word list and cropping all words to 3 characters ahead of time, that's 52 calls to ConvertWordToAscii and 52 linear searches over 400,000 (500,000? depends if it reads in CR/LF or if it just preserves the LF) characters each. Considering we want to add Unicode support that number 52 would get huge. And actually, numbers can be part of the words too if read in from the wordlist which would cause an infinite loop >.> (at least without any special modifications).
My current algorithm calls ConvertWordToAscii 300,000 times (for a 100,000 word list), independent of the number of possible values each character can have.
Thanks for the suggestion and I'll continue to think about it....
I did some more testing, and I got the file read time down to less than 10 seconds when reading in the wordlist. I made a few small changes, but what really sped it up is that I disabled the duplicate check when actually reading the wordlist file (it is still enabled when adding new words to the file while typing).
So now it takes under 10 seconds to start the script and under 15 seconds to exit... I consider this fairly acceptable for a 100,000 word list. Faster is always better though

.
Code:
; Intellitype: typing aid
; Press 1 to 0 keys to autocomplete the word upon suggestion
; (0 will match suggestion 10)
; - Jordi S
; Heavily modified by:
; Maniac
;___________________________________________
; CONFIGURATIONS
#NoEnv
SetBatchLines, 20ms
ListLines Off
OnExit, SaveScript
; Editor Window Recognition
; (make it blank to make the script seek all windows)
ETitle =
;Minimum word length to make a guess
WLen = 3
keyagain=
key=
clearword=1
;Gosub,clearallvars ; clean vars from start
; Press 1 to 0 keys to autocomplete the word upon suggestion
; (0 will match suggestion 10)
;_______________________________________
CoordMode, ToolTip, Relative
AutoTrim, Off
WordListDone = 0
;reads list of words from file
FileRead, ParseWords, %A_ScriptDir%\Wordlist.txt
Loop, Parse, ParseWords, `n, `r
{
AddWordToList(A_LoopField)
}
ParseWords =
SetTimer, Winchanged, 100
GoSub, ReverseWordNums
WordlistDone = 1
Loop
{
;Editor window check
WinGetActiveTitle, ATitle
WinGet, A_id, ID, %ATitle%
IfNotInString, ATitle, %ETitle%
{
ToolTip
Setenv, Word,
WinWaitActive, %ETitle%
Continue
}
;Get one key at a time
Input, chr, L1 V,{enter}{space}.;`,:¿?¡!'"()]{}{}}{bs}{{}{esc}{tab}{Home}{End}{PgUp}{PdDn}{Up}{Dn}{Left}{Right}
EndKey = %errorlevel%
; If active window has different window ID from before the input, blank word
; (well, assign the number pressed to the word)
WinGetActiveTitle, ATitle
WinGet, A_id2, ID, %ATitle%
IfNotEqual, A_id, %A_id2%
{
Gosub,clearallvars
Setenv, Word, %chr%
Continue
}
ifequal, OldCaretY,
OldCaretY = %A_CaretY%
ifnotequal, OldCaretY, %A_CaretY%
{
; add the word if switching lines
AddWordToList(Word)
Gosub,clearallvars
Setenv, Word, %chr%
Continue
}
OldCaretY=%A_CaretY%
;Backspace clears last letter
ifequal, EndKey, Endkey:BackSpace
{
StringLen, len, Word
IfNotEqual, len, 0
{
ifequal, len, 1
{
Gosub,clearallvars
} else {
StringTrimRight, Word, Word, 1
}
}
} else ifequal, EndKey, Max
{
Setenv, Word, %word%%chr%
} else {
;addword = %Word%
;Gosub, addwordtolist
AddWordToList(Word)
Gosub, clearallvars
}
;Wait till minimum letters
IF ( StrLen(Word) < wlen )
{
ToolTip,
Continue
}
;Match part-word with command
Num =
Match =
singlematch = 0
number = 0
StringLeft, baseword, Word, %wlen%
baseword := ConvertWordToAscii(baseword,1)
Loop
{
IfEqual, zword%baseword%%a_index%,, Break
IfEqual, number, 10
Break
if ( SubStr(zword%baseword%%a_index%, 1, StrLen(Word)) = Word )
{
number ++
singlematch := zword%baseword%%a_index%
match .= Mod(number,10) . ". " . singlematch . "`n"
singlematch%number% = %singlematch%
Continue
}
}
;If no match then clear Tip
IfEqual, Match,
{
clearword=0
Gosub,clearallvars
Continue
}
;Show matched command
StringTrimRight, match, match, 1 ; Get rid of the last linefeed
WinGetActiveTitle, ATitle
WinGetPos, , PosY, , SizeY, %ATitle%
ToolTipSizeY := (number * 12)
ToolTipPosY := A_CaretY+14
if ((ToolTipSizeY + ToolTipPosY) > (PosY + SizeY))
ToolTipPosY := (A_CaretY - 14 - ToolTipSizeY)
IfNotEqual, Word,,ToolTip, %match%, %A_CaretX%, %ToolTipPosY%
; +14 Move tooltip down a little so as not to hide the caret.
}
; Timed function to detect change of focus (and remove tooltip when changing active window)
Winchanged:
WinGetActiveTitle, ATitle
WinGet, A_id3, ID, %ATitle%
IfNotEqual, A_id, %A_id3%
{
ToolTip ,
} else {
; If we are in the correct window, and OldCaretY is set, clear the tooltip if not in the same line
IfInString, ATitle, %ETitle%
{
IfNotEqual, OldCaretY,
{
IfNotEqual, OldCaretY, %A_CaretY%
{
ToolTip,
}
}
}
}
Return
; Key definitions for autocomplete (0 to 9)
#MaxThreadsPerHotkey 1
$1::
$2::
$3::
$4::
$5::
$6::
$7::
$8::
$9::
$0::
CheckWord(A_ThisHotkey)
Return
; If hotkey was pressed, check wether there's a match going on and send it, otherwise send the number(s) typed
CheckWord(Key)
{
global
Local ATitle
Local A_id2
Local WordIndex
StringRight, Key, Key, 1 ;Grab just the number pushed, trim off the "$"
IfEqual, Key, 0
{
WordIndex = 10
} else {
WordIndex = %Key%
}
clearword=1
; If active window has different window ID from before the input, blank word
; (well, assign the number pressed to the word)
WinGetActiveTitle, ATitle
WinGet, A_id2, ID, %ATitle%
IfNotEqual, A_id, %A_id2%
{
SendInput,%key%
Gosub,clearallvars
Return
}
IfNotEqual, OldCaretY, %A_CaretY% ;Make sure we are still on the same line
{
SendInput,%key%
Gosub,clearallvars
Return
}
ifequal, Word, ; only continue if word is not empty
{
SendInput,%key%
Setenv, Word, %key%
clearword=0
Gosub,clearallvars
Return
}
ifequal, singlematch%WordIndex%, ; only continue singlematch is not empty
{
SendInput,%key%
Setenv, Word, %word%%key%
clearword=0
Gosub,clearallvars
Return
}
Local sending
Local len
Local ClipboardSave
; SEND THE WORD!
sending := singlematch%WordIndex%
StringLen, len, Word
; Update Typed Count
UpdateWordCount(sending)
SendPlay, {BS %len%}{Raw}%sending% ; First do the backspaces, Then send word (Raw because we want the string exactly as in wordlist.txt)
; below works but uses clipboard
;ClipboardSave:=ClipboardAll
;Clipboard = %sending%
;SendPlay, {BS %len%}^v ; First do the backspaces, Then send word (Raw because we want the string exactly as in wordlist.txt)
;Clipboard = %ClipboardSave%
Gosub,clearallvars
Return
}
; This is to blank all vars related to matches, tooltip and (optionally) word
clearallvars:
Ifequal,clearword,1
{
Setenv,word,
OldCaretY=
}
ToolTip
; Clear all singlematches
Loop, 10
{
singlematch%a_index% =
}
sending =
key=
match=
clearword=1
Return
AddWordToList(AddWord)
{
global
Local CharTerminateList
Local Base
Local AddWordInList
Local CountWord
Local pos
Ifequal, Addword, ;If we have no word to add, skip out.
Return
if ( Substr(addword,1,1) = ";" ) ;If first char is ";", clear word and skip out.
{
IfEqual, wordlistdone, 0 ;If we are still reading the wordlist file and we come across ;LEARNEDWORDS; set the LearnedWordsCount flag
{
IfEqual, AddWord, `;LEARNEDWORDS`;
LearnedWordsCount=0
}
addword =
Return
}
ifequal, wordlistdone, 1
{
if addword contains 1,2,3,4,5,6,7,8,9,0
Return
}
IF ( StrLen(addword) <= wlen ) ; don't add the word if it's not longer than the minimum length
{
addword =
Return
}
Base := ConvertWordToAscii(SubStr(addword,1,wlen),1)
IfEqual, WordListDone, 0 ;if this is read from the wordlist
{
IfNotEqual,LearnedWordsCount, ;if this is a stored learned word
{
CountWord := ConvertWordToAscii(addword,0)
IfEqual, LearnedWords, ;if we haven't learned any words yet, set the LearnedWords list to the new word
{
LearnedWords = %addword%
} else { ;otherwise append the learned word to the list
LearnedWords .= "," . addword
}
zCount%CountWord% := LearnedWordsCount++ ;increment the count and store the Weight of the LearnedWord in reverse order (will be inverted later)
}
IncrementCounterAndAddWord(Base,AddWord)
} else { ; If this is an on-the-fly learned word
AddWordInList =
Loop ;Check to see if the word is already in the list, case sensitive
{
IfEqual, zword%base%%a_index%,, Break
if ( zword%base%%a_index% == AddWord )
{
AddWordInList = 1
Break
}
Continue
}
IfEqual, AddWordInList, ; if the word is not in the list
{
CountWord := ConvertWordToAscii(addWord,0)
zCount%CountWord% = 1 ;set the count to one as it's the first time we typed it
IfEqual, LearnedWords, ;if we haven't learned any words yet, set the LearnedWords list to the new word
{
LearnedWords = %addword%
} else { ;otherwise append the learned word to the list
LearnedWords .= "," . addword
}
IncrementCounterAndAddWord(Base,AddWord)
} else {
UpdateWordCount(addword) ;Increment the word count if it's already in the list
}
}
Return
}
IncrementCounterAndAddWord(Base,AddWord)
{
global
local pos
; Increment the counter for each hash
zbasenum%Base%++
pos := zbasenum%Base%
; Set the hashed value to the word
zword%Base%%pos% = %addword%
}
; This sub will reverse the read numbers since now we know the total number of words
ReverseWordNums:
LearnedWordsCount+=4
Loop,parse,LearnedWords, `,
{
AsciiWord := ConvertWordToAscii(A_LoopField,0)
zCount%AsciiWord% := LearnedWordsCount - zCount%AsciiWord%
}
AsciiWord =
LearnedWordsCount =
Return
UpdateWordCount(word)
{
; If the Count for the word already exists - ie if it's a learned word, increment it, else don't.
local CountWord := ConvertWordToAscii(word,0)
IfNotEqual, zCount%CountWord%,
{
zCount%CountWord%++
local WordBase
StringLeft, WordBase, word, %wlen% ;find the pseudohash for the word
WordBase := ConvertWordToAscii(WordBase,1)
Local ConvertWord =
Local LowIndex =
Local WordList =
Loop
{
ifequal, zword%WordBase%%A_Index%, ;Break the loop if no more words to read for the hash
Break
CountWord := zword%WordBase%%A_Index% ;Set CountWord to the current Word position
ConvertWord := ConvertWordToAscii(CountWord,0) ; Find the Ascii equivalent of the word
IfNotEqual, zCount%ConvertWord%, ;If there's no count for this word do nothing
{
IfEqual, LowIndex,
LowIndex = %A_Index% ;If this is the first word we've found with a count set this as our starting position
WordList .= "," . zCount%ConvertWord% . "z" . CountWord ;prefix all words with (zCount"z")
}
}
ifnotequal, Wordlist, ;If we have no words to process, don't
{
StringTrimLeft, WordList, WordList, 1
Sort, WordList, N R D, ;Sort the wordlist by order of
LowIndex-- ;A_Index starts at 1 so this value needs to be decremented
Local IndexPos =
Loop, Parse, WordList, `,
{
IndexPos := LowIndex + A_Index ;Set the current word we are processing to the starting pos plus word position
StringTrimLeft, CountWord, A_LoopField, InStr(A_LoopField,"z") ;Strip (Number,"z") from beginning
zword%WordBase%%IndexPos% = %CountWord% ; update the word in the list
}
}
}
Return
}
ConvertWordToAscii(Base,Caps)
{
; Return the word in Ascii numbers padded to length 3 per character
; Capitalize the string if NoCaps is not set
IfEqual, Caps, 1
StringUpper, Base, Base
Loop, Parse, Base
{
New .= SubStr("00" . Asc(A_LoopField),-2)
}
Return New
}
SaveScript:
; Add all the standard words to the tempwordlist
FileRead, ParseWords, %A_ScriptDir%\Wordlist.txt
LearnedwordsPos := InStr(ParseWords, "`;LEARNEDWORDS`;",true,1) ;Check for Learned Words
IfNotEqual, LearnedwordsPos, 0
{
TempWordList := SubStr(ParseWords, 1, LearnedwordsPos - 1) ;Grab all non-learned words out of list
} else {
TempWordList := ParseWords
}
ParseWords =
; Parse the learned words and store them in a new list by count if their total count is greater than 5.
; Prefix the word with the count and "z" for sorting
Loop, Parse, LearnedWords, `,
{
SortWord := ConvertWordToAscii(A_LoopField,0)
IfGreaterOrEqual, zCount%SortWord%, 5
{
SortWordList .= "," . zCount%SortWord% . "z" . A_LoopField
}
}
StringTrimLeft, SortWordList, SortWordList, 1 ;remove extra starting comma
Sort, SortWordList, N R D, ; Sort numerically, comma delimiter
IfNotEqual, SortWordList, ; If SortWordList exists write to the file, otherwise don't.
{
TempWordList .= "`;LEARNEDWORDS`;`r`n"
Loop, Parse, SortWordList, `,
{
StringTrimLeft, AppendWord, A_LoopField, InStr(A_LoopField,"z") ;Strip (Number,"z") from beginning
TempWordList .= AppendWord . "`r`n"
}
StringTrimRight, TempWordList, TempWordList, 2
FileDelete, %A_ScriptDir%\Temp_Wordlist.txt
FileAppend, %TempWordList%, %A_ScriptDir%\Temp_Wordlist.txt ;Only update the file if we have learned words
FileCopy, %A_ScriptDir%\Temp_Wordlist.txt, %A_ScriptDir%\Wordlist.txt, 1
FileDelete, %A_ScriptDir%\Temp_Wordlist.txt
}
ExitApp
KakaruKeys, at this point I think we need to look into merging my script and TypingAid. We'll need some parameters to disable/enable word learning (for those who want to use long phrases rather than just words) and to control which characters trigger a new word. We might want to consider a preferences file and possibly even a preferences gui (a file is easy but I'm not sure I care to write a GUI atm).
Then the next step will be proper unicode support.