Most efficient way to do an array lookup

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
JJohnston2
Posts: 204
Joined: 24 Jun 2015, 23:38

Most efficient way to do an array lookup

30 Jun 2017, 13:53

For...
  • A search string I'd like to find in a text file
  • The string will be an exact match of one full line in the text file, and one line only
  • i.e., there are no duplicates in the file
  • And if there were, only the first result is needed
  • The files have potential to be thousands of lines long and maybe 150 chars per line, so not very large in the general scheme of things
  • Function needs to return what line number of the file the search string can be found on
Currently there is working code to do this by using Loop, Read, then doing a compare If (A_LoopReadLine=SearchStr). When a match is found, the loop A_Index value is saved, the loop breaks and the function returns the saved index (indicating the line # found).

This works but it's a really a terrible way to do an array lookup. Even worse, the current code runs multiple times (reading the file from disk) instead of loading it once and doing a lookup in memory after-the-fact.

Questions
  1. Is there an easy way to map a text file into an array of lines without using a loop to read them in? Loop implementation would be easy, just wondering if there's a different way
  2. Once the text lines are in an array, is there built-in command or RegExMatch method that could be used without having to loop and manually compare every line the array?
  3. Or another idea: Would it be possible just to read the file contents into a single variable and be able to figure out which line of the file a string is on by finding a match and then counting how many line endings are prior to that? (Or something of that nature?)
  4. Maybe an associative array might work for one-time lookup if the indices were stored and the search string was used as the key?
Just looking for some more efficient ways to code this type of thing.
User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Most efficient way to do an array lookup

30 Jun 2017, 14:13

Code: Select all

FileRead, FileText, % A_Desktop "\Test\List.txt"
Data := {}
Loop, Parse, FileText, `n, `r
	Data[A_LoopField] := A_Index
SearchLine := "kasdklfjasjkf asjdf  sjdf ksd sjf14525j kj534k67 8**( klsjdklfjsd f 14 15kjsk kl 1j4k1k 1kj 1 "

MsgBox % Data[SearchLine]
The basic premise is to get all the file, put each line in an array with the line text as the "key" and the line number as the "value".

One important point is that an array "key" is not case sensitive so this search will also not be case sensitive. If that is required there is a more complicated "object" that can be used called "Scripting.Dictionary".

Also I am not sure if there is any text that would cause a problem as a key for an array.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks
JJohnston2
Posts: 204
Joined: 24 Jun 2015, 23:38

Re: Most efficient way to do an array lookup

30 Jun 2017, 14:53

Thanks, this is very close to what I was considering doing but it would have taken me a while to get to what you are showing here.

Also, if anyone wanders along later, it looks like the StringSplit function can be used regarding my original Question #1.

This solution however needs the loop to assign the index, so StringSplit doesn't do any good in this case and the above solution is more elegant anyway.

Thanks.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Most efficient way to do an array lookup

30 Jun 2017, 15:23

Instr(allLines "`n", line "`n"), typed on the phone, I hope it's ok.
Cheers.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Most efficient way to do an array lookup

30 Jun 2017, 15:30

Code: Select all

q::
vText := "a,b,c,d,e"
vPos := InStr(vText, ",c,")
vText2 := SubStr(vText, 1, vPos)
StrReplace(vText2, ",", "", vCount), vCount += 1
MsgBox, % vCount
return
Of course, you can add some variations to handle the first/last items.

If you notice, I find the position, and then copy the beginning text out to another variable to count the delimiters. I had wondered if there was a way to count the delimiters, without having to create a new variable first. Either via AHK or via dll function, but no luck so far. Anyhow it might involve adding temporary null characters to a string. A similar issue would be to perform InStr, to search only between characters A and B of a string, not the entire string.

==================================================

I might modify FanaticGuru's code to use double quotes to treat everything as strings, when referring to key names:

Code: Select all

FileRead, FileText, % A_Desktop "\Test\List.txt"
Data := {}
Loop, Parse, FileText, `n, `r
	Data["" A_LoopField] := A_Index
SearchLine := "kasdklfjasjkf asjdf  sjdf ksd sjf14525j kj534k67 8**( klsjdklfjsd f 14 15kjsk kl 1j4k1k 1kj 1 "

MsgBox % Data["" SearchLine]
Re. potential problem key names:

obj.HasKey() fails for all keys if a key called HasKey exists - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=31682
It appears that a workaround is to use ObjHasKey(obj, key) instead of obj.HasKey(key).
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
JJohnston2
Posts: 204
Joined: 24 Jun 2015, 23:38

Re: Most efficient way to do an array lookup

30 Jun 2017, 16:38

@Helgef: InStr() will yield the search string location in bytes (characters), not in line numbers.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Most efficient way to do an array lookup

30 Jun 2017, 16:48

Yes, see jeeswg's comment.
Edit:

Code: Select all

allLines:="a`nb`nc"
line:="b"
StrReplace(SubStr(allLines,1,Instr( "`n" . allLines . "`n",  "`n" . line . "`n")),"`n","`n", lineNumber) 
MsgBox, % lineNumber? lineNumber+=1 : 0
HotKeyIt
Posts: 2364
Joined: 29 Sep 2013, 18:35
Contact:

Re: Most efficient way to do an array lookup

30 Jun 2017, 17:14

1. Simply prefix your key with a special character so any key can be used.
2. Check if a duplicated line already exist so you always will find the first occurrence of the line.
3. You can also use Loop, Read.

Code: Select all

Data := {}
Loop, Read, % A_Desktop "\Test\List.txt"
  If !Data.HasKey("`n" A_LoopReadLine)
	Data["`n" A_LoopReadLine] := A_Index
SearchLine := "kasdklfjasjkf asjdf  sjdf ksd sjf14525j kj534k67 8**( klsjdklfjsd f 14 15kjsk kl 1j4k1k 1kj 1 "

MsgBox % Data["`n" SearchLine]
EDIT.
You can also create a Class to make it a bit cleaner:

Code: Select all

Data := new FileGetLine(A_ScriptFullPath)
MsgBox % Data.GetLineNumber("	GetLineNumber(text){")

Class FileGetLine {
	__New(file){
		Loop, Read, %file%
		  If !this.HasKey("`n" A_LoopReadLine)
			this["`n" A_LoopReadLine] := A_Index
	}
	GetLineNumber(text){
		return this["`n" text]
	}
}
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Most efficient way to do an array lookup

30 Jun 2017, 17:24

One possibility is to store every position for each line.

Cheers HotKeyIt for the reminders re. using a prefix character and doing an initial HasKey when adding keys (otherwise the value will be the last and not the first occurrence, although that may be preferable sometimes).

Code: Select all

q::
FileText := "a,B,C,D,e,a,B,C,D,e"
FileText := StrReplace(FileText, ",", "`r`n")
Data := {}
Loop, Parse, FileText, `n, `r
	Data["`n" A_LoopField] := LTrim(Data["`n" A_LoopField] "," A_Index, ",")
SearchLine := "a"
MsgBox % Data["`n" SearchLine]
vOutput := ""
for vKey, vValue in Data
	vOutput .= SubStr(vKey, 2) " " vValue "`r`n"
MsgBox, % vOutput
return
If you only want to check if an item is present, and aren't worried about the position (e.g. for a spellchecker):
convert list to simple array - AutoHotkey Community
https://autohotkey.com/boards/viewtopic ... 67#p156967
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Most efficient way to do an array lookup

30 Jun 2017, 18:15

Here is a string manipulation version:

Code: Select all

FileRead, FileText, % A_Desktop "\Test\List.txt"
SearchLine := "stuff"
Pos := RegExMatch(FileText, "m`a)^\Q" SearchLine "\E$")
StrReplace(SubStr(FileText,1,Pos), "`n",, Line), Line++
MsgBox % Line
This version is case sensitive and is probably a better way of doing it. Super long random keys make me a little uncomfortable while this version should be able to handle any length or configuration of search.

It also allows for easy modification of the needle in the RegExMatch to get more complex matches.

If you like your code compact and obfuscated, you can do StrReplace(SubStr(Haystack,1,RegExMatch(Haystack, "m`a)^\Q" Needle "\E$")), "`n",, Line), Line++.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Most efficient way to do an array lookup

01 Jul 2017, 02:09

if you want to use regex, you might need to consider the case where Needle contains \E. instr has a case senstivity option if desired.
User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Most efficient way to do an array lookup

03 Jul 2017, 13:47

Helgef wrote:if you want to use regex, you might need to consider the case where Needle contains \E. instr has a case senstivity option if desired.
Yea, wanting to search for something with \E in it would be a problem. You can go through the string and replace any \E with some obscure symbol then at the end change that obscure symbol back to \E. Of course then your string cannot contain that obscure symbol or you again have a problem.

Using InStr it is hard to anchor the needle to the beginning and ends of the line if you are not sure what end of line code is being used (ie `n or `r or `n`r etc.).

This is a coding problem in general. That last little bit of bullet proofing and idiot proofing can take more time than the original code.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Most efficient way to do an array lookup

03 Jul 2017, 13:58

I was trying to address this problem here:
simplest way to make a RegEx needle literal? - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=30420

I thought that '\E' was quite unlikely to appear in a string, although I would still make my code safe, however, actually paths, where a folder/file starts with E, are reasonably likely.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Most efficient way to do an array lookup

03 Jul 2017, 17:06

Hello FG. Agree on the problems of bullet proofing in general :problem:
For the line endings I'd look at the first line and assume it's the same for all lines.

@jeeswg, good topic :thumbup:. I think \Q \E is kind of a poor choise actually.

cheers.
hasantr
Posts: 933
Joined: 05 Apr 2016, 14:18
Location: İstanbul

Re: Most efficient way to do an array lookup

24 Sep 2019, 02:24

HotKeyIt wrote:
30 Jun 2017, 17:14
1. Simply prefix your key with a special character so any key can be used.
2. Check if a duplicated line already exist so you always will find the first occurrence of the line.
3. You can also use Loop, Read.

Code: Select all

Data := {}
Loop, Read, % A_Desktop "\Test\List.txt"
  If !Data.HasKey("`n" A_LoopReadLine)
	Data["`n" A_LoopReadLine] := A_Index
SearchLine := "kasdklfjasjkf asjdf  sjdf ksd sjf14525j kj534k67 8**( klsjdklfjsd f 14 15kjsk kl 1j4k1k 1kj 1 "

MsgBox % Data["`n" SearchLine]
EDIT.
You can also create a Class to make it a bit cleaner:

Code: Select all

Data := new FileGetLine(A_ScriptFullPath)
MsgBox % Data.GetLineNumber("	GetLineNumber(text){")

Class FileGetLine {
	__New(file){
		Loop, Read, %file%
		  If !this.HasKey("`n" A_LoopReadLine)
			this["`n" A_LoopReadLine] := A_Index
	}
	GetLineNumber(text){
		return this["`n" text]
	}
}
How does this work?
User avatar
divanebaba
Posts: 804
Joined: 20 Dec 2016, 03:53
Location: Diaspora

Re: Most efficient way to do an array lookup

24 Sep 2019, 05:40

Hi.
For this preference:
The string will be an exact match of one full line in the text file, and one line only
According to my knowledge by reading topics and helpfile, fastest way should be avoiding arrays and using instead buildin-variable A_LoopField, which has already stored the whole row automatically inside the parsing loop.
Making an array from A_LoopField is an additional and timeconsuming action you can leave out in your case.
You can check by time-comparing like below.

Code: Select all

SearchText := "blabla, bla bla"
timeStart := A_TickCount
FileRead, FileText, % A_Desktop "\Test\List.txt"
Loop, Parse, FileText, `n, `r
{
	if (A_LoopField = SearchText)
		{
			row := A_index
			break
		}
}
timeEnd := A_TickCount
time := timeEnd - timeStart
MsgBox % row "`n`n" time " milliseconds."
return
User avatar
Chunjee
Posts: 1400
Joined: 18 Apr 2014, 19:05
Contact:

Re: Most efficient way to do an array lookup

27 Sep 2019, 22:54

I thought the backwards key/value lookup was very interesting. But in my own test with .indexOf, even a array of 99000 elements only took 62 milliseconds to find an exact match at the very end of the array. So for all the setup (and risk) of an unconventional array setup, I find it hard to imagine the payoff would be very high.

My test:

Code: Select all

SetBatchLines, -1
#Include %A_ScriptDir%\node_modules\
#Include biga.ahk\export.ahk
#NoTrayIcon
#SingleInstance, force

A := new biga()

;; Setup the data
arr := []
loop, 99000
{
    Random, randomNumber , 1, 9
    vVal := A.repeat(randomNumber,150)
    arr.push(vVal)
}
arr.push("last value that we wanna find")


;; Start the test
Start := A_TickCount
foundIndex := A.indexOf(arr, "last value that we wanna find")
msgbox, %  "match found at " foundIndex "`n`n`nSpeed: " (A_TickCount - Start) "milliseconds"
ExitApp
Last edited by Chunjee on 28 Sep 2019, 09:02, edited 1 time in total.
User avatar
Chunjee
Posts: 1400
Joined: 18 Apr 2014, 19:05
Contact:

Re: Most efficient way to do an array lookup

27 Sep 2019, 23:07

JJohnston2 wrote:
30 Jun 2017, 13:53
Is there an easy way to map a text file into an array of lines without using a loop to read them in? Loop implementation would be easy, just wondering if there's a different way
Yes. This was probably already answered but I'll add a little spin on it.

Code: Select all

FileRead, vFileMemory, % A_ScriptDir "\inputFile.txt"
memoryArray := StrSplit(vFileMemory, "`n")

A := new biga()
; re-map the array AND
; remove any whitespace on the begining and end of each line
newArray := A.map(memoryArray, A.trim)
proof of .map and .trim working like this can be found at https://biga-ahk.github.io/biga.ahk/#/?id=trim

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Bubo_Bubo, mikeyww, OrangeCat, RussF and 122 guests