My frustration with the design of AutoHotkey has reached a peak recently. The interface of the new BoundFunc type is inconsistent with the Func type. The addition of this type requires me to change Plaster. The inconsistency makes it impossible for me to add support for the BoundFunc type to Plaster's HasMethod, because HasMethod depends on Func-like types having MinParams, MaxParams, and IsVariadic properties.
I hope that this post can encourage positive change in the development of AutoHotkey.
When considering changes to a programming language, it is important to keep its design goals in mind, to make sure that the changes help achieve, or at least do not harm achieving, the design goals.
As far as I know there is no official statement of the design goals, but Chris' and Lexikos' posts on the forums cause me to believe this is close:
- make it easy to automate Windows and Windows programs that were not designed with automation in mind
- make it easy for novice programmers to learn
AutoHotkey does achieve the goal of making it easy to automate Windows and Windows programs that were not designed with automation in mind. I have never had a problem with this aspect of AutoHotkey, nor have I encountered another programming language that does it as well, much less better.
AutoHotkey does not achieve the goal of being easy for novice programmers to learn.
Traits that make a programming language easy for a novice to learn are:
- good error handling
- familiarity
- consistency
- distinguishing unrelated concepts
- elegance
Familiarity means less to learn. Most novice programmers know some English and high school algebra. Programming languages like Python are easy for novices to learn, because they avoid requiring much beyond this. AutoHotkey's heavy use of string interpolation (%s) for code causes difficulty for new programmers, because it is not something they are already familiar with (among other reasons).
Consistency means less to learn, remember, and write code to abstract over. Each inconsistency is one more thing to remember, that is not relevant to the problem you are trying to solve, while trying to solve your problem. Inconsistency requires writing more code to abstract over the differences that should not exist. AutoHotkey has many inconsistencies. For example, there appears to be no rhyme or reason as to where %s are required.
Distinguishing unrelated concepts prevents undesirable behavior. If inconsistency is creating differences where there should be none, conflation is creating similarities where there should be none. AutoHotkey conflates many concepts. For example, storing a key named “HasKey” will break the method with the same name, due to conflating interface and contents.
It may be surprising to associate elegance with novices' needs. It is more often associated with experts' desires. However, elegance means not needing to learn a lot of different things, to do a lot of different things. AutoHotkey is not very elegant. For example, it has more control flow constructs than most programming languages, but they are less generally useful.
Experts benefit from the same traits that benefit novices. They do have a higher tolerance for unfamiliarity, which allows them to make use of unfamiliar but useful syntax and semantics.
Is the design of AutoHotkey so bad that fixing it is worth destroying all code written in it?
Breaking backwards compatibility in a programming language is rarely done, and when it is, it rarely ends well.
The transition from Perl 5 to 6 has resulted in most Perl programmers abandoning Perl, and most of what is left remains on 5.
The transition from Python 2 to 3 has resulted in most Python programmers remaining on 2.
While I agree that, in AutoHotkey's case, it really is that bad, I am of the opinion that v2 should not be released until it fixes most of the problems in this post. In its current state I do not find it ‘better enough’ to be worth the loss.
The Type System:
The changes to the type system do not fit neatly into categories for areas of improvement. They affect multiple categories. Discussing them also provides an overview of the programming language. Therefore it seems to be a natural place to start.
The type hierarchy should look similar to this:
Code: Select all
Null
Object
├─Array
├─ComObj
├─Dict
├─Enum
├─Exception
│ ├─Defect
│ └─System
├─File
├─Float
├─Func
│ └─BoundFunc
├─Int
│ └─Bool
├─RegExMatch
└─Str
All types must obey the Liskov Substitution Principle. In short, each subtype must have the interface (members; i.e. properties and methods) of the supertype (but may have more members), must not require more than the supertype (but may require less), and must guarantee everything the supertype does (but may guarantee more). Without this, object-oriented programming does not work, because polymorphism does not work. That matters, because not following these rules is error-prone, and requires more code to abstract over differences that should not exist.
The development team seems to have a love-hate relationship with object-oriented programming, but supporting it is the only way to allow user-defined types to be consistent with built-in types. Even Haskell, which is normally decidedly non-object-oriented, uses typeclasses for this purpose, which are almost identical to Java interfaces.
AutoHotkey does not exist in a vacuum. Since its purpose is automation, it will often be used with Automation (a.k.a. OLE Automation; informally COM) APIs. AutoHotkey is already dependent on some Automation interfaces (e.g. for depends on _NewEnum() from the collection interface and Next() from the IEnumVARIANT interface).
Code: Select all
╔══════════════════════════════════════════════════════════════════════════════╗
║ Collection Interface ║
╠════════════════════════╤═════════════════════════════════════════════════════╢
║ Member │ Description ║
╠════════════════════════╪═════════════════════════════════════════════════════╢
║ Add(IndexOrKey, Value) │ a method used to insert an element ║
╟────────────────────────┼─────────────────────────────────────────────────────╢
║ Count │ a property containing the number of elements ║
╟────────────────────────┼─────────────────────────────────────────────────────╢
║ Item[IndexOrKey] │ a parameterized property used to look up an element ║
╟────────────────────────┼─────────────────────────────────────────────────────╢
║ Remove(IndexOrKey) │ a method used to remove an element ║
╟────────────────────────┼─────────────────────────────────────────────────────╢
║ _NewEnum() │ a method that returns an enumerator over the ║
║ │ elements ║
╚════════════════════════╧═════════════════════════════════════════════════════╝
Code: Select all
╔══════════════════════════════════════════════════════════════════════════════╗
║ IEnumVARIANT Interface ║
╠═════════════╤════════════════════════════════════════════════════════════════╣
║ Member │ Description ║
╠═════════════╪════════════════════════════════════════════════════════════════╣
║ Clone() │ a method that returns a copy of the enumerator ║
╟─────────────┼────────────────────────────────────────────────────────────────╢
║ Next(Count) │ a method that returns the next Count items ║
╟─────────────┼────────────────────────────────────────────────────────────────╢
║ Reset() │ a method that resets the enumeration sequence to the beginning ║
╟─────────────┼────────────────────────────────────────────────────────────────╢
║ Skip(Count) │ a method that attempts to skip the next Count items in the ║
║ │ enumeration sequence ║
╚═════════════╧════════════════════════════════════════════════════════════════╝
Of course we should not adopt these interfaces without question.
Microsoft suggests Insert as an alternate name for Add in Guidelines for Creating Collection Classes. This is what AutoHotkey uses, and I approve. Add is a bad name for a method that performs anything other than addition.
AutoHotkey has no need for an Item parameterized property, but the terminology might be adopted for consistency with Automation.
Then there is the matter of what gets implemented versus what the standards require. Of the collection interface, the only members you can rely on are Count, Item[IndexOrKey], and _NewEnum(). Of the IEnumVARIANT interface, the only members you can rely on are Next(Count), and perhaps Skip(Count). With the exception of Skip(Count), Visual Basic will fail upon trying to use the collection if any of these members are absent. Lack of standards compliance is not just a third party issue. Excel does not support the Reset() method on its enumerators, for example.
It would be best if AutoHotkey did not rely on any additional members of these interfaces, except for Count, which is guaranteed to be present on Automation collections. It should, however, support their use.
Count should be present on anything with a notion of size in AutoHotkey (Array, Dict, File, RegExMatch, and Str). It should contain 0 when a collection is empty, not "". It should be a read-only property, not a method (as was being considered for v2). It should not be parameterized. This assures no additional code is needed to abstract over a difference between AutoHotkey and Automation objects that should not exist.
Clone() should be present on most mutable compound types (Array, Dict, and Enum). It would be confusing on certain types (e.g. File), because it would be unclear what is getting copied.
There is also the question of where integer indices should begin (for Array and Str, and miscellaneous uses like Func parameter indices).
It has been known for a long time (Why Numbering Should Start at Zero from 1982) that 0-based indexing is best. Half-closed intervals compose more easily than closed intervals. Imagine that you want to use an array as a circular buffer. With 0-based indexing this is easy to achieve, by initializing your index to 0, incrementing it, and using modulo to limit it to the length of the array. With any other base this requires adjustment.
There is no standard for the indexing of Automation collections. It appears that, originally, Microsoft intended 1-based indexing. The 54 Commandments of COM Object Model Design might lead one to believe this is still the case. However, Microsoft's own IE, WMI, Access, ADO, and DAO use 0-based indexing, while the rest of Microsoft Office and Visual Studio uses 1-based indexing. Third party Automation APIs supposedly usually use 0-based indexing. That means whatever AutoHotkey chooses, it will not match Automation APIs universally.
AutoHotkey also supports DLL calls, and C and C++ use 0-based indexing.
It would be preferable if AutoHotkey switched to 0-based indexing in v2, for ease of use and interoperability. There should be a warning in the documentation that the indexing of Automation collections varies.
Interfaces should be provided for operator overloading and customizable hashing. It should be an error (detected before the program starts) to override hashing but not overload equality. That is necessary for proper dictionary lookup. Python and Lua are good sources of inspiration. They use a mechanism similar to AutoHotkey's meta-functions for this purpose. Like the other standard interfaces, this allows user-defined types to be consistent with built-in types.
Good Error Handling:
The best way to handle errors is to change the programming language to make them impossible, without limiting the programming language's power.
By “power” I refer to what the programming language can express. Turing completeness is an example. In practice, most programming languages are limited by their I/O and linking facilities, not their computational facilities.
Unfortunately, making an error impossible without limiting power is also the hardest way to handle errors. Still, there are some obvious opportunities in AutoHotkey's case…
Allocation and initialization should always be combined, and initialization and mutation should always be separated. This makes it impossible to read uninitialized data at run time, and prevents typos from creating new variables. It is still possible to typo a variable name, but with this change that can (and should) be detected as an error before the program starts, by looking for variables that are read or written but not defined within the scope.
Example syntax:
Code: Select all
def x 0 ; allocation and initialization
x := 1 ; mutation
AutoHotkey should switch to tracing garbage collection. I expect integrating the Boehm garbage collector to be the lowest-effort way to achieve this. That would eliminate memory leaks due to cyclical references.
Most other errors will need to be detected and reported at run time…
AutoHotkey's current policy of ignoring errors and continuing execution is obviously wrong. When an error occurs at run time, data has become inaccessible or corrupt. Continuing execution causes corruption to spread through the program's state like a plague, maximizing damage (potentially to files). When the program crashes due to operating system enforced error handling, or exits on its own, the programmer is left with no clues to help them understand why their program misbehaved.
Of course, sometimes there are clues. The programmer can check ErrorLevel and A_LastError about every other line and hope they get lucky. Not every error sets ErrorLevel or A_LastError. They can also obsessively check for empty strings where they would not be expected. Since this requires an unbearable amount of effort, and obscures the intent of the code, it is rarely if ever done in practice.
Run time error handling should:
- be consistent
- require no effort to detect errors
- halt execution, unless the program has been written to handle the error
- if the program does not handle the error, display a message that reports:
- where the error was encountered (file and line)
- the relevant values
- what expectation was violated
While it is the programming language's responsibility to detect most errors, and report them in a helpful way, programmers occasionally need to report their own types of errors. To assure these reports are helpful, the programming language needs good introspection support. AutoHotkey falls flat here as well.
There is the problem of defining and catching a new type of error. While you can use classes to define new types, and you can throw anything, this is useless because objects have no string representation (more on that in a moment). Further, even if you did throw something other than a string, you could not catch it selectively.
Most programming languages have a type hierarchy of exceptions, and a catch statement that matches one or more types. Ideally there would be a supertype of all exceptions named something like Exception, with two subtypes named something like Defect and System. Defect exceptions should not normally be caught, because they are caused by the programmer, such as division by zero. System exceptions should normally be caught, and usually involve I/O, like trying to open a locked file. All other exceptions, both built-in and user-defined, would be defined under Defect or System. In the rare case where it is not bad design to catch all exceptions, like when implementing an interpreter, the common supertype Exception makes this possible.
Example syntax:
Code: Select all
try {
DangerousOperation(Arg)
} catch ExampleError, DifferentError as Problem {
; Handle ExampleError and DifferentError here.
}
try {
DangerousOperation(Arg)
} catch ExampleError {
; Handle ExampleError here.
} catch DifferentError as Issue {
; Handle DifferentError here.
}
There is also the problem of determining if an error has occurred. AutoHotkey, like most programming languages, has no problem checking values or relationships between values. What AutoHotkey does have problems with is checking types or interfaces. AutoHotkey v1 is incapable of type checking, and v2's support is broken. Both v1 and v2 have broken support for checking interfaces.
AutoHotkey v2's is operator seems to work acceptably. This can be used to determine if the type of a value is what you expect.
It is sometimes possible to detect the presence of a property or method in AutoHotkey (v1 or v2) by using ObjHasKey, due to the conflation of dictionary and user-defined types. This does not work for built-in types or Automation objects. When it does ‘work’, it is unreliable, due to the conflation of interface and contents. Dedicated HasProperty and HasMethod functions should be provided that work on both AutoHotkey (built-in and user-defined types) and Automation objects. They should be functions, not methods, to avoid potentially conflicting with methods with the same name. You can use this to determine if an interface is supported by a value, which is often preferable to checking the type, because multiple types may implement the same interface without having any supertypes in common.
Once you have determined that an error has occurred, there is the problem of reporting it in a helpful way. The runtime can be expected to handle reporting the file and line number. Reporting the relevant values and what expectation was violated must be left to the programmer.
Only numbers (including Booleans) and strings convert to strings in AutoHotkey. This makes it impossible to implement good error reporting. A meta-function similar to Python's __repr__() should be adopted. It should not be named something like “Str” or “String”, because the representation of a string may contain escape sequences and always begins and ends with double quotes. Converting a string to a string returns the same value. Converting a string to its representation does not return the same value. All built-in types must implement this meta-function. This will be required if AutoHotkey is ever to have a REPL anyway. The documentation should explain that this meta-function ideally returns source code that would produce the same value, but where that is impossible (e.g. for Automation objects) it should return a string in angle brackets that contains as much helpful information as possible (e.g. Plaster returns things like <ComObj IDictionary at 0x0000000001234567>). The angle bracket notation for unrepresentable values seems to be universal.
true and false being glorified integer constants makes error reports harder to read than necessary. It is important to be able to convert between Boolean and integer values, for working with integers containing flags, but it would be better if they were a subtype of integers, with true and false as their representation. That would make it considerably easier to read error reports that contain both integer and Boolean values.
Error reporting would benefit from the ability to query the path and file name of a File object.
AutoHotkey v2's Type function seems to be missing in the latest alpha I tested (a063), but it is still in the documentation. When it ‘worked’, it reported different types for AutoHotkey and Automation enumerations (one of which had double-semicolons in the name), even though they have the same interface, requirements, and guarantees, and reported Object for user-defined types, instead of their actual type. Type would normally be used to report what the type is when it was not what you expected. The type names should be based on the class (for user-defined types) or the factory function (for most built-in types) that produces them. The type names I listed in my suggested type hierarchy follow these rules. I also tried to keep them short, but familiar.
Familiarity:
Other than the pervasive peppering of %, someone that can read English, and is familiar with high school algebra, should not find AutoHotkey much less familiar than most programming languages.
The use of string interpolation for code should be eliminated for this and other reasons.
Consistency:
Changing the type names to be related to the class or function that constructs them as already mentioned will improve the consistency of AutoHotkey to ease learning and remembering.
Several changes already mentioned will improve the consistency of AutoHotkey to eliminate the need to write code to abstract over differences that should not exist:
- requiring types to follow the Liskov Substitution Principle (e.g. Func and BoundFunc)
- the standard collection interface (e.g. Count)
- the cloneable interface
- operator overloading
- customizable hashing
- exceptions for all error handling (e.g. removing ErrorLevel and A_LastError, and actually handling errors)
- the representation interface
Examples where this principle is violated:
- user-defined types cannot “extend” built-in types
- built-in types cannot be monkey patched
- built-in methods do not support the Func interface
- built-in methods cannot be stored in variables or data structures
Example syntax:
Code: Select all
MyFunc(x) ; call a function directly, through its default function reference
MyFuncRef(x) ; call a function through a function reference
There should be no observable difference between iterating over an AutoHotkey Array and a SAFEARRAY, or an AutoHotkey Dict and a Scripting.Dictionary. Currently SAFEARRAY values end up in the Key variable, and the VARIANT type constant value ends up in the Value variable. Key should contain the index, and Value should contain the value. Scripting.Dictionary keys are handled correctly, but the VARIANT type constant value ends up in the Value variable. Key should contain the key, and Value should contain the value. If the VARIANT type constant is of concern, AutoHotkey v2's Type function should be extended to handle it, and it should return the VARIANT type constant name, not value, since that is much more readable.
A common complaint is being unable to remember where " and % are required. % is particularly problematic, because it is used both as brackets, and alone. The bracketed form is especially hard to read, because unlike ‘normal’ brackets, % is not directional. This requires nested uses to add parentheses. Still, these are all merely symptoms of the real problem, which is where expressions are allowed seems to be random. Expressions should be allowed anywhere they might be useful. If the programmer wants to use a literal, they can write it the way they usually would. Strings would always be enclosed in "s, and the need for % would be eliminated.
Bitwise-not's behavior changes based on the range of values passed to it, making it difficult to use reliably. ~N should always be equivalent to -1 - N, as it is in Mathematica, and most other programming languages with bignum support (e.g. Python). Programming languages with bignum support are relevant, because, like AutoHotkey's integers which could be 32-bit or 64-bit, the bit-width of bignums varies.
Hotkey notation is inconsistent with Send notation. Send notation is longer, but significantly more readable. Send notation should be used everywhere.
It should be possible to add and remove hotstrings dynamically, and have them call function objects, like hotkeys.
Distinguishing Unrelated Concepts:
The conflation of arrays, dictionaries, exceptions, and user-defined types causes several problems:
- it is impossible to distinguish between arrays, dictionaries, and exceptions for type or interface checking
- it is impossible to distinguish between user-defined types that are collections and ones that are not for interface checking
- dictionary keys are case folded
- arrays are unnecessarily space and time inefficient
- dictionaries are unnecessarily time inefficient
Sometimes the case of dictionary keys matters. For example, I have written a keyboard layout optimization program (in another programming language) that made heavy use of dictionaries. Each character of a large corpus was stored along with its frequency in a dictionary, which was used for various purposes. Uppercase letters indicate Shift being pressed along with the letter, which is part of determining how frequently Shift is used. If I could not tell the difference between uppercase and lowercase letters, I could not accurately detect how often Shift is used. Many other text processing programs would need this functionality (e.g. implementing an interpreter for a case-sensitive programming language, and natural language processing).
Although efficiency should be the least important concern for a programming language like AutoHotkey, there is no reason to spend space and time if it does not make the programming language easier to use.
C++, which is what AutoHotkey is implemented in, comes with a generic dynamic array type in the Standard Template Library. If AutoHotkey's arrays were implemented as dynamic arrays of tagged unions, or references to objects, the space wasted by the singly linked lists used to implement a dictionary, and the time wasted by hashing indices and chasing pointers in the linked lists, could be saved.
C++ also comes with a generic dictionary type in the Standard Template Library. C++'s unordered map (a hash table) will not return items in an easily predictable order, hence the name. The contents of hash tables have to be sorted if they are to be abused as arrays, which wastes time. Stopping the abuse would save time. Data structures which maintain order (like red-black trees), to avoid the need for sorting, are less time efficient than hash tables.
Arrays, dictionaries, exceptions, and user-defined types should be different types. The dictionaries AutoHotkey programmers use should not case fold their keys. If dictionaries are going to be used to implement call stack frames and user-defined types (and they almost certainly are), this should not be revealed in the interface.
Missing elements are another nuisance caused by conflating arrays and dictionaries. Tolerating missing elements makes it impossible to predict the length of the array that will be produced by many transformations (e.g. take the first “n” elements, take every “n”th element, reverse the order, etc.). AutoHotkey uses missing elements to indicate default values should be used in variadic calls, so they cannot be removed without changing that. Using dynamic arrays for AutoHotkey arrays would eliminate the possibility of having missing elements. I suggest only allowing trailing unspecified arguments in variadic calls, like most programming languages (e.g. Lisp and Python). Another alternative is to use null to indicate the corresponding default value should be used. Sparse arrays are almost never useful, but if you want them you could always use a dictionary and sort the keys, or implement a red-black tree.
Even if null is not used to allow leading and internal unspecified arguments in variadic calls, it could be a useful addition to AutoHotkey. It can be used to tell the difference between nothingness and an empty string.
null should:
- be a unique type (not "" or 0)
- be impossible to subtype
- only have a single instance
- only support being assigned to variables and data structures, and checking for equality and inequality
- only be equal to itself
The conflation of interface and contents causes some problems:
- storing a key with the same name as a property or method in a dictionary will break that property or method
- the interface changes based on the contents, which is nonsensical
These problems are not specific to dictionaries, or dictionaries masquerading as other types. For example, the RegExMatch type has these problems.
The . operator should refer to the interface of an object, while the [] operator should refer to its contents. A change in an object's contents should never change its interface. This still allows monkey patching by using the . operator.
The rest of the problems in this section are arguably problems with consistency not distinguishing unrelated concepts, but they can only be corrected after correcting the conflations mentioned so far…
AutoHotkey currently uses two different notions of object-hood. One is based on what IsObject returns true for. The other is based on instances of Object. IsObject returns true for more than instances of Object. There cannot be two different types with identical names, so this should be corrected. With the improvements I suggest, IsObject will no longer be needed, since everything will be an object except for null, and null can be tested for with equality. The functionality of Object will be broken out into several appropriate types.
Floating point numbers used as dictionary keys are indexed by their string representation, not their value, unlike integers. AutoHotkey does not have a dictionary type as such, but when that is corrected, this should be too. It should be easy and efficient to reinterpret_cast the floating point number to an integer for use as its hash code. Some canonicalization will need to be performed (e.g. to assure -0 and 0 have the same hash code), but that should not be too difficult. Correcting this will keep dictionary lookup from breaking if the floating point string representation is changed.
Elegance:
Elegance in programming language design is a result of the following characteristics:
- simplicity – few primitive constructs
- generality – the primitive constructs can be used for many different purposes
- composability – constructs can be combined to produce more complex constructs
- brevity – achieving the programmer's goal requires little code
It may seem like brevity would naturally result from simplicity, generality, and composability. While this is usually true, pathological counterexamples can be constructed. One instruction set computers are examples of such Turing tarpits.
It may seem like pursuing elegance alone would be sufficient for good programming language design. While this is also usually true, pathological counterexamples can also be constructed for it. APL is an example of an elegant programming language that is markedly unfamiliar.
Elegant programming language designs exhibit certain tendencies that may act as signposts to indicate you are on the right path, but they are not always present:
- relevance – code requires few constructs irrelevant to the programmer's goal
- symmetry – constructs often have symmetrical relationships (e.g. inverse functions)
Sometimes there is no known efficient way to implement symmetrical constructs.
Elegance benefits those that implement the programming language, not just those that use it. Elegance usually results in less code to write, test, document, and maintain.
Occasionally elegance does create work for the implementers (e.g. garbage collection). The needs of the users should be put before those of the implementers, because there are more of them. Besides, the implementers are likely to be users as well.
AutoHotkey can be transformed into an elegant programming language by:
- eliminating redundant constructs
- minimizing non-composable constructs
- generalizing constructs
- File – Seek((Distance [, Origin = 0])), Tell(), Position, Pos; Length, AtEOF
- RegExMatch – Value(N), [N]
In RegExMatch's case, it is probably best to retain [N], and remove the rest. [N] refers to its contents, as is proper.
The Call meta-function is redundant and should be removed. The Func interface (which has __Call) is a superset of its functionality.
AutoHotkey v2 has function versions of most ‘commands’ except for control flow statements. The function version can be composed (nested), while the command version cannot. The function version can be used via function references, while the command version cannot. The command versions are inferior and should be eliminated. That will have the pleasant side effects of drastically reducing global namespace pollution by eliminating all the constants and keywords used by those commands, and eliminating most uses of %.
AutoHotkey makes heavy use of global mutable state, and until recently, required the use of unstructured control flow and hard-coded event handlers, which makes it very hard to write code that can be composed. This is an example of the fractal of bad design that can result from elegance being neglected, where bad design decisions at the programming language level force, or at least strongly encourage, bad design decisions in code written in it.
Experienced programmers may recoil upon first reading the screenfuls of global variables used in AutoHotkey. However, most of these are constants, or read-only, making them relatively innocuous.
The real culprit is the extreme configurability of AutoHotkey. Code that works under one combination of settings may malfunction under another. These settings are not scoped, so combining code that requires different combinations of settings is difficult or impossible. Most of these settings should have been function or method parameters (e.g. SetFormat). Programming languages should not be configurable.
One setting, StringCaseSense, is worth singling out. String comparisons should always be case-sensitive. If case-insensitive comparison is desired, it is trivial to lower- or upper-case the string beforehand. AutoHotkey comes with functions for that (StringLower and StringUpper). This will eliminate the inconsistency of = and == obeying different rules than the rest of the comparison operators. It will also allow = to be removed. = is a bad choice for a comparison operator. In mathematics it defines a permanent equality relationship. In most (C-syntaxed) programming languages it performs assignment. In AutoHotkey it is neither of those. Most programmers, upon seeing it in an expression, will assume it is an error and == was intended.
AutoHotkey would be better off without ++ and --. Consciously or not, most programmers expect expressions to be free of side effects (i.e. they expect expressions to compose). ++ and -- are normally used in expressions, and they perform assignment. Most mistakes occur when ++ or -- appear more than once in an expression. Few programmers can correctly predict the order the side effects will occur in. This kind of unnecessary and unhelpful complexity has little place in a programming language designed to appeal to novices. Lua and Python get by just fine without these operators. Python forbids any form of assignment in expressions, due to its confusing nature, which is a stance AutoHotkey should adopt.
Now that unstructured control flow is no longer required to handle events, gosub and goto should be removed. Functions make gosub redundant. gosub is similar to a function call, only it cannot accept arguments, and labeled code blocks can overlap. goto is rarely included in modern programming languages, due to there being composeable replacements (various branch and loop statements, and functions), code that uses it being hard to understand and change, and it making optimization difficult. Java, JavaScript, Python, and functional programming languages get by just fine without these constructs.
A limited amount of non-composable constructs will have to be tolerated. At least one global mutable reference or variable must exist to pass state between event handlers. Various forms of goto that cannot bypass initialization should also be allowed. This includes break, continue, return, and exception handling. Labels should be retained due to break and continue making good use of them. Even in Haskell programmers end up reinventing these with monads, primarily for error handling. On rare occasions they also greatly improve time efficiency. I/O is also not composeable, but without I/O a computer is just a bad space heater.
The interfaces of the Str and File types should be generalized.
Str should be similar to an Array of characters. Specifically, it should be possible to index them with [], and iterate over them with for. The reason Str is not a subtype of Array, is Str should be immutable, so it is safe to use strings as Dict keys. Since Array supports mutation, its subtypes are required to. These changes would make it possible to write procedures that operate on both Array and Str without requiring code to abstract over differences that should not exist. Lexers frequently need to iterate over strings, character by character.
File's vast number of (Raw)Read(Line|Num) and (Raw)Write(Line|Num) methods should be reduced to one of each, and use parameters to dictate the desired behavior. It should be possible to iterate over lines of text in a file using for. This is more brief than using a while loop to achieve the same effect.
for should be generalized, and loop removed. The Str changes just mentioned provides most “parse a string” loop functionality, and the StrSplit function covers the rest. The File changes just mentioned eliminate the need for the “read file contents” loop. Objects with enumerators should provide for with “files & folders” and “registry” loop functionality. Python's os.listdir and os.walk can serve as examples of handling the file system this way. The registry would be handled similarly, due to its hierarchical nature. loop should be omitted from the beginning of until loops, as in most programming languages.
Sort should be ‘generalized’. Generalized is in scare quotes because what I really propose is changing the type that it works on from Str to Array. However, that is more generally useful, and it is easy to get the previous behavior with this design.
Str is a bad choice for input to Sort, because putting data into a string loses all its structure and type information. It is also difficult to assure that the character combination used to split the string is not unexpectedly contained within the data somewhere. Further, internally, the existing Sort implementation must convert the string into an array. Converting data into a string, only to have it converted into an array, then converted back into a string, which will probably have to be converted back into usefully structured data, introduces a lot of unnecessary complexity and is very time inefficient.
Sort should be a function with these signatures:
Sort(Arr)
Sort(LTFunc, Arr)
Functions with multiple signatures can be implemented by making them variadic and throwing exceptions if incorrect numbers of arguments, or arguments with incorrect types or interfaces, are passed to them. This parameter order was chosen because it is the most useful with BoundFunc or currying. You are more likely to want to use the same comparison function with multiple arrays, than use the same array with multiple comparison functions.
Sort should work on anything with the same interface as Array (e.g. SAFEARRAYs). We want it to be generalized.
The sorting algorithm should be stable; probably merge sort or some variation (like Timsort). Stable sorts can be composed to sort ‘within’ each other.
Sort should default to using the < operator for comparison, but it should be possible to pass a reference to a function or function object for custom comparison. While other comparisons can be made to work, < is the one that is conventionally used for higher-order sort functions. Having the ability to customize sorting is important. It allows new types to be sorted, and existing types to be sorted new ways.
Sort should be referentially transparent (i.e. it should return a new array, instead of changing an existing one). My experience with both forms existing in Python is the destructive version surprises novice programmers, and as with most side effects, tends to cause even experienced programmers to make occasional mistakes. If you want to use a referentially transparent sort as a destructive sort, it is as simple as assigning the return value to the original variable (e.g. MyArray := Sort(MyArray)).
As promised, in the unlikely event that you want to destructively sort the contents of a string, it is easy to do with this design.
Example:
Code: Select all
Result := ""
for Index, Value in Sort(StrSplit(MyString, "`r`n")) {
Result .= Value . "`r`n"
}
MyString := Result
Example:
Code: Select all
MyString := StrJoin(Sort(StrSplit(MyString, "`r`n")), "`r`n")
Additional (referentially transparent) functions could be added to provide the remaining functionality that has been conflated with Sort:
- Reverse – reverse the order of array elements
- Shuffle – randomly rearrange an array
- Uniq – remove duplicates from a sorted array
The Standard Library:
The standard library is so large that I have probably overlooked some problems. Someone that is more familiar with it (i.e. the development team) should go through it carefully, looking for naming inconsistencies in functions, methods, and parameters, and parameter order inconsistencies. These problems are not just aesthetically offensive, they cause difficulties remembering function and method names, and defects resulting from passing arguments in the wrong order.
When choosing between different parameter orders keep optional parameters and BoundFunc (or currying) in mind. The more likely a parameter is to be omitted, the later it should appear in the parameter list. The more likely a parameter is to be reused, the earlier it should appear in the parameter list.
If a ‘real’ module system is not going to be introduced, the standard library should be broken down into ‘fake’ namespaces by abusing classes, similar to how Lua's standard library is organized by using tables. This would reduce global namespace pollution, and give more visual structure to the programming language. Problems with inconsistent naming and parameter order may become more apparent in the process.
The GUI API is difficult to understand and use because it uses the wrong paradigm. Windows and controls are objects. They are long-lived bundles of mutable state that you can perform certain limited, well defined, operations on. One might also reasonably argue they form push dataflow networks, but I do not suggest representing them that way, since that paradigm is unfamiliar to most novice programmers (though they are likely to have used spreadsheets), and it would be inconsistent with the evaluation of the rest of the programming language. They are definitely not procedures, however you look at them. Well-respected GUI API's, like QtGUI, should be used for inspiration when fixing this. It should not be necessary to write this, but the improved API should, like everything else, use references to functions and function objects for event handling.
Additions:
I would prefer AutoHotkey v2 to primarily be about changing and removing constructs, without limiting the programming language's power, rather than adding them.
There are some additions that I believe would make AutoHotkey much more pleasant to use. They are presented in order from most to least important to me.
A grid layout manager would prevent AutoHotkey programs from having to micromanage controls. Other types of layout managers are often provided by GUI toolkits, but most needs can be met with only a grid layout manager. Inspiration should be taken from existing good designs like QGridLayout.
Eval can be useful for deserialization.
A REPL would make AutoHotkey much easier to use. This would appear second on my list if it did not require Eval to implement. It also requires the representation interface mentioned in the “Good Error Handling” section.
Once % is no longer used for string interpolation of code, having it return as its conventional use for modulo would be very nice.
It should be possible to write and represent integers in binary notation (e.g. 0b101). This is nice for working with binary files, which often pack several values into one or more bytes.
Efficiency:
I encountered this distressing line in the manual:
There is no excuse for encouraging programmers to write all their code on one line. This could be interpreted as saying the code can be spread out over multiple lines, and commas can be used for continuation sections, but that still makes the code uglier. The implementation flaw that penalizes the time efficiency of properly written code should be fixed. It is an implementation flaw. I know of no other programming language with this problem.AutoHotkey Help: Variables and Expressions wrote: In v1.0.48+, the comma operator is usually faster than writing separate expressions, especially when assigning one variable to another (e.g. x:=y, a:=b). Performance continues to improve as more and more expressions are combined into a single expression; for example, it may be 35% faster to combine five or ten simple expressions into a single expression.
I considered not posting this. I fully expect it to primarily, if not exclusively, receive dismissive responses and flames.
I decided to post it anyway, because not trying guarantees failure.
I spent several hours a day, for about two weeks, summarizing the problems I have encountered in the year I have been heavily using AutoHotkey. Hopefully some good will come of the effort.