If accessing dynamic variables dominates the creation and deletion operations, it is OK.
Since currently, variables are never deleted (not even local variables, see below) this might be the case.
This assumes that mostly local variables are deleted in blocks, and the variables of a terminated block are not accessed any more.
Hash tables are of fixed size, and we have to scan them fully to remove local variables, which is slow. Accordingly, it is better to allocate a small (8-bit) hash table to each instance of a function or block. If it becomes overcrowded, we replace it with a larger one. When the function exits, the whole local hash table is deleted. It does not take time at all, but more memory is consumed.
It means, we cannot assume that local variables are allocated in contiguous blocks, so, when the heap is exhausted we have to go through our live tables to find still needed data. Traversing sorted lists or (dense) balanced trees is faster than traversing mostly empty hash tables (but we can scan fast through Null pointers).
The current approach is that local variables (and formal parameters, which are basically locals themselves) are never deleted, though their contents might be. This is done primarily so that at load-time, any line in the script that references such a variable can be given its permanent address. Thus at runtime, such a script never searches for variables (except dynamics) because all the addresses have already been resolved.
I can imagine a kind of *lazy* approach: keep the last few created variables in a separate list. If it grows beyond a certain limit, sort and merge this one to the main list, what can be done in a single sweep.
Very interesting. Are you saying that the lazy list, when it reaches the size limit, should be sorted into the main list via q-sort or some other fast sort? Perhaps the lazy list itself would be a sorted list (binary search capable), and whose size limit is fairly large, say 1000 variables.
With 16 bit hash the table is only 256KB.
The idea of performing an extra indirection in exchange for halving the memory required is great. Hashing is more attractive now.
I have two questions: 1) Would even a memory restricted hash, that isn’t that sparse and thus has a lot of continuation entries, be generally superior to the binary search approach in both inserts and searches?; 2) Which hash function is generally considered the best way to transform a variable name into a hash index? I was naively thinking that each letter in the variable name could be multiplied by an order of magnitude, so that something like abc would be ascii-of-a*100 + ascii-of-b*10 + ascii-of-c*1.
Currently a block (subroutine) cannot be entered until a possible previous call has not terminated.
There is no such restriction. A subroutine can have any number of instances on the call stack (either through recursion or an interrupting thread). You might be thinking of hotkeys, which default to one instance each. This default can be changed on a per-hotkey basis with #MaxThreadsPerHotkey or the Hotkey command.
One way to handle local variables is to prepend some distinguishing string to their names, like `BlockID` (with the quotes forbidden otherwise in names). Each new block (function, procedure) receives a unique ID number at creation. At sorted list-, and balanced tree - dynamic variable management the pointers to local variables will be next to each other, and so a Clear is fast.
Wonderful. I can see how this would allow merging of all local and global variables into a single hash table, which would definitely save memory and probably reduce complexity. However, I’m still stuck with the need to backup and restore blocks of local variables, for which such a large hash table might be ill-suited. To clarify, here are the details of the current plan:
When a function is called and there are no other instances of it on the call stack (which is probably true 90% of the time), no backup and restore is necessary and everything below is skipped.
But when a function is called recursively, or called by a new thread that suspended the previous thread, we backup the list of local variables of the instance that lies immediately beneath us on the call stack. The primary thing that needs to be backed up is the variable’s pointer-to-contents (previously allocated). In other words, the address of each local variable
never changes. But the address of each local variable’s contents does change, either through recursion or an interrupting thread.
Because recursed layers and underlying threads are suspended while the current layer is running, they do not need their local variables until they're resumed. Thus, when the topmost instance of a function completes, all it has to do is free any memory that its layer of variables allocated and restore each variable's pointer-to-contents prior to returning to its caller.
Perhaps this approach is rather hair-brained and should be abandoned completely. If so, please be honest, but keep in mind that it was adopted so that the address of all non-dynamic variables, including local variables, could be resolved once at load-time to accelerate runtime performance.
Thanks for everything.