Preview of changes: scope, function and variable references

Post by **lexikos** » 30 Jan 2021, 07:41

2021-03-07: This has been superseded by v2.0-a128.

I have uploaded an experimental build exploring some potential interrelated language changes.

https://www.autohotkey.com/download/2.0/AutoHotkey_2.0-a124-22-g0a70f190.zip
https://github.com/Lexikos/AutoHotkey_L/commits/x

# ByRef

ByRef has been replaced with the reference operator, using the symbol &. (v2.0-a111 replaced the address operator with separate functions, StrPtr and ObjPtr.)

&var produces a VarRef object, which can be used as follows:

Dereference with %ref% to read or assign the target variable.
Pass it to a ByRef parameter (see below).
Pass it to an OutputVar parameter.
Pass it to IsSet.
Anything else one might do with a reference to an object, such as storing it in an array or binding it to a function parameter.

## Parameters

ByRef parameters are declared as &var instead of ByRef var, and always require a VarRef except when omitted (and optional). To permit a VarRef or any other value, declare a normal parameter and use explicit dereferencing where appropriate.

IsByRef has been removed due to ambiguity in the implementation (an alias may be due to passing a reference in, taking a reference to the parameter itself, or referring to it in a closure).

Only built-in functions have OutputVar parameters. Due to current technical limitations, taking a reference to a built-in variable is not permitted. However, if &var is passed directly to an OutputVar parameter, var is permitted to be built-in (but not read-only).

## VarRef Object

Like ComObject, properties and methods cannot be defined for a VarRef, and it has no base object.

# Var Unset Errors

The #Warn UseUnset warning type has been removed. Referencing an unset variable (except as the target of an assignment or the reference operator) now raises an error.

# Global

Why change?

Originally, all undeclared variables in a function were local by default. This reduces the risk of unintended side-effects, but is a frequent cause of errors. Most of the time variables do not need to be declared, so declarations are easily forgotten.

#Warn LocalSameAsGlobal was added to point out potential errors where a global declaration is forgotten, but it requires the author to opt in, and it often gets in the way, warning about variables that were intended to be local. In effect, it would require function authors to meticulously declare local as well as global variables, to avoid possible warnings when the function is used in different scripts. This generally isn't done, since warnings only appear when the warning is enabled and there is a conflict (as opposed to whenever a local variable lacks declaration).

When a global variable is used by many functions, requiring a declaration for it in each function is inconvenient, adds more noise to the code, and increases maintainance cost (especially when the declaration and use are separate).

Using global varname outside any function to make the variable "super-global" mitigates some of these issues, but still requires the author to declare the variable. Sometimes the need for a declaration is not understood, or the declaration is forgotten. Declaring varname super-global also increases the risk of unintended side-effects, since varname is no longer local by default.

While super-global does allow some repeated global declarations to be avoided, it conversely may require authors to take even more care naming or declaring local variables. Code reuse and sharing is more error-prone, since it may be difficult or impossible to predict what class or super-global variable names will be used.

Force-local was added to allow an author to protect his function from such conflicts, but it requires the author to opt in, which probably only happens after the author has experience with conflicts. It also burdens the author with the need to declare every global (or super-global) variable used by the function; and again, declarations are easily forgotten.

Currently in v2-alpha, each class declaration creates a super-global constant. By making it read-only, errors where a class is unintentionally overwritten are avoided. But instead, variables intended to be local might instead cause a load-time error, when they coincide with a class.

#Warn UseUnset is helpful for detecting forgotten declarations in functions which attempt to read (but not assign) global variables, but only if and when the variable reference is reached during evaluation. #Warn VarUnset detects some errors immediately upon launching the script, but does nothing for the issues inherent to super-global variables.

## Assign-local

Within a function, if a given variable name is not used as the target of an assignment or reference (&) operator inside the function, it may resolve to an existing global variable even without declaration.

Classes are no longer super-global. For instance, inside a function which contains object := {}, object refers only to a local variable, not to the Object class. New classes can be added without changing the behaviour of existing functions (except functions which had erroneous references to undefined variables, or which set them only dynamically).

Declarations for a given variable can be limited to just the functions that modify that global variable, making global potentially useful as an indicator of side-effects. Since fewer functions will need to declare the global variable, there is less reason to declare it super-global. With fewer super-global variables, there is less risk of a function having unintended side-effects (assigning to a global variable), and less need for force-local.

### Known Issues

A dynamic assignment such as %'x'% := y (without assume-global) will create a local variable even if non-dynamic references to x were resolved to a global variable due to lacking any non-dynamic assignments. This local variable can only be referenced dynamically.

Resolving the dynamic assignment to a global variable would be inconsistent with non-dynamic assignments, would make unintended side-effects more likely, and would make identifying side-effects more difficult. It may be better to require assume-global or force-local when creating new variables at runtime, or disable the creation of new variables.

Assign-local resolves a non-dynamic reference in a function to a global variable only if it exists at load time. If it is not declared anywhere or referenced non-dynamically in global scope, non-dynamic references inside functions are resolved to (unassigned) local variables. These variables could be assigned dynamically.

# Merging of Variable and Function Namespaces

Function names are no longer kept separate to variable names. Instead, each function definition creates a "read-only variable" (constant). Virtually any sub-expression can be called by immediately following it with an open parentheses (with no leading space). For instance, MsgBox("Hello") is the same as (MsgBox)("Hello"); in both cases, MsgBox is a constant referring to the MsgBox function (unless shadowed by a local variable).

If the target of a function call is a function constant, parameters are validated as before. If a function name is mispelled, the error is usually detected by #Warn VarUnset. Even if it is not detected at load time, a runtime error is raised (as usual) if the value cannot be called.

%x%() now performs a double-deref and then calls the result, so %'MsgBox'%() performs as before but %MyObj%() is invalid (the percent signs should be removed).

Func("name") has been removed since name is sufficient. Other built-in functions which accepted function names have been changed to only accept references. For example, SetTimer MyFunc instead of SetTimer "MyFunc". IsFunc("name") has also been removed, and scripts should generally deal with function objects directly, not function names (strings). If a name string must be resolved to a function reference, it would be done via a double-deref. Function objects can be validated as before.

Due to the increased complexity and potential for accidents, the function library auto-include mechanism has been removed. (Another reason is that it might be superseded by module/namespace support.) #Include <Lib> still works as before.

Force-local now affects calls to global functions, since function calls are just as dependent on variable scope rules as naked variable references. For instance, global MsgBox is required before the global MsgBox function can be used in a force-local function, since MsgBox() is resolved the same way as f := MsgBox, f().

## Classes

ClassName.New() has been replaced with ClassName.Call(), since one can now simply write ClassName().

The Object(), Array() and Map() functions have been removed. Map now has a constructor which accepts "key, value" parameter pairs, and can be called as Map(key, value) since Map in this context is now the class rather than a function name. As the Object() constructor is inherited by derived classes (including all user-defined classes where the base class was unspecified), it (still) does not accept parameters, unlike the former Object() function.

Methods and properties are merged once again, so the following were removed: ObjOwnMethods, DefineMethod, DeleteMethod, HasOwnMethod, OwnMethods.

Property descriptors accepted by DefineProp and returned by GetOwnPropDesc have an additional property Call, which specifies a function object to be called when the property is called. If not defined for a call such as x.y(), the y property's getter is called to retrieve a function object, which is then called.

Methods of classes (both built-in and user-defined) are defined as simple value properties.

## Closures

Each nested function defines a local static constant. If a nested function captures a non-static local variable/constant of the outer function, that nested function becomes a Closure and the corresponding constant becomes non-static. In other words, nested functions are closures only when they have to be; a single function can contain both Func and Closure nested functions.

Before, if a function had any downvars (that is, if any of its local variables were referenced by a nested function), all of its (immediate) nested functions became closures. This was because even a nested function without any such references might call a closure directly or dynamically, or pass its name to a built-in function (such as SetTimer, Hotkey or Func). In order to instantiate a closure by name, it had to be a closure.

Because they are tied to local constants, named closures are only instantiated once each time the outer function is called. Before, each call to Func("name") instantiated a new Closure referring to the same variables. Something like OnMessage(n, Func("f")), OnMessage(n, Func("f"), 0) would fail if f was a closure since two different objects were passed to OnMessage.

Closure references within these non-static local constants are not counted; instead, the closures are kept alive for as long as any closure in the group has a non-zero reference count (or the outer function is still running). This allows recursive or inter-dependent closures to exist without creating a circular reference which locks them in memory. It also allows a closure to safely refer to itself, such as to pass itself to SetTimer, OnMessage, etc. However, copying a closure reference into a "captured" local variable (or an object contained by one) still causes a problematic circular reference.

iseahound · Post by **iseahound** » 30 Jan 2021, 20:33

Thanks, Lexikos. I think the merging of function and variable names spaces was probably inevitable.

&x syntax is similar to an out parameter. However, I don't think that functions should be required to declare if they accept byref parameters or not. Wouldn't it be simpler to use &x as a form of write permission instead? That makes the code clearer, and solves the problem of functions that accept a VarRef or any other value. One benefit would be making an equivalence between x := add3(x) and add3(&x) for return x + 3.

Post by **lexikos** » 31 Jan 2021, 04:05

iseahound, I rarely understand your posts on the first try, and even more rarely agree. I was frankly reluctant to open the topic after seeing that you had posted.

The new ByRef concept is a mixture of C#'s ref parameters and Perl's references. ref parameters and out parameters are obviously similar concepts; in C#, the latter is just a more restricted version of the former.

Clearly, functions are not required to declare that they accept a VarRef:

To permit a VarRef or any other value, declare a normal parameter and use explicit dereferencing where appropriate.

No, imposing additional restrictions or checks on &x parameters would not simplify anything. And implementing both ref and out parameters would absolutely be more complicated than just implementing ref parameters. No, it would not make the code clearer. It would not be used with functions that accept "a VarRef or any other value", so I have no idea why you think it would solve any "problem" related to such. I see no problem in the first place; all functions that accept any value can accept a VarRef, because a VarRef is a value.

Changing &x to declare an output-only reference parameter would not make x := add3(x) and add3(&x) equivalent. Seeing the function call does not tell you how the function parameter is declared; it is just passing a variable reference to the function. One can infer that the function might assign a value through the reference, regardless of whether the function declares its parameter for input, input-output or just output.

Post by **lexikos** » 16 Feb 2021, 05:05

I am promoting this "experiment" to "preview".

I have uploaded a new build based on v2.0-a124.

As ClipboardAll, Float, InputHook, Integer and String are now classes, the functions have been replaced with class Call methods, which are called with the same syntax. The RegExMatch class was renamed to RegExMatchInfo.

Fixed an issue with calling methods on function references, such as SomeFunc.Bind().

kczx3 · Post by **kczx3** » 16 Feb 2021, 09:26

So to determine if something is callable we'd need to do:

Code: Select all

if (Type(myVar) ~= "Func|Closure" || HasMethod(myVar, "Call")) {
    myVar(true)
}

?

Post by **lexikos** » 17 Feb 2021, 04:19

In theory, you should just check if it has a Call method, same as before.

The correct way to check for a Func object is now (with v2.0-a124) myVar is Func. The other built-in function objects are derived from Func (and this is shown in new documentation).

But what made you think Func/Closure need to be checked for? Func.Prototype defines a Call method, so HasMethod will return true.

There are some cases where HasMethod won't or can't give you the right answer.

1. Any COM object may be callable if it responds to invocation with the ID DISPID_VALUE and flag DISPATCH_METHOD. The object may or may not provide type information, and even if it does, I am not certain that the DISPID_VALUE member would always be included. Regardless, the HasMethod function does not support COM objects. Keep in mind that if an AutoHotkey object is exposed to another script via COM, all limitations of the interface apply to it (but if you know it's an AutoHotkey object, you can call obj.HasMethod("Call") instead of HasMethod(obj, "Call"), which would fail).

2. If you delete the Call method from Func.Prototype, Func objects will no longer have a Call method (unless you define one). HasMethod will tell you there is no Call method and Fn.Call() will fail, but calling the object itself will still work. That includes the following: Fn() (or %Fn%() in the current alpha branch), when it is called by built-in functionality such as SetTimer, or when it is called as a method of some other object. This is due to the way that the call ultimately ends up executing the function, rather than looking up the Call method in an infinite loop.

3. HasMethod doesn't check whether the method object is callable. Like the other two points, this isn't new, but there's more ambiguity now that properties and methods are mixed again.

kczx3 · Post by **kczx3** » 17 Feb 2021, 08:19

I think this from your original post is what made me think that they needed to be checked for.

lexikos wrote:IsFunc("name") has also been removed; Type can be used to check for "Func" or "Closure" instead (and class Func may be added in future to permit the use of is Func checks)

Ultimately, the goal is just to ensure that whatever is contained in myVar is actually callable.

Post by **lexikos** » 18 Feb 2021, 03:54

The closest direct equivalent for IsFunc would be to perform a double-deref to get the function object (b := %a%), then a type check to verify it is a function (b is Func). On second thought, maybe it is a mistake to even think of a replacement for IsFunc. One would generally use IsFunc(a) to verify that %a%() has a chance of succeeding. It was already inefficient in that the function name would be resolved once when you call IsFunc and again each time you call the value. If you have a function which works with function names and objects, you could use Func(a) to normalize the input, then just work with objects. Of course, you could just require the caller to pass an object in the first place, which avoids any problems related to the scope of nested functions.

Now %a%() performs a double-deref and call (two separate operations), but you probably don't ever want to do that if you are cautious. To validate, you need to perform the double-deref first (b := %a%). Once you've done that, there probably isn't a good reason to do it again; instead, keep the value and call it like any other callable value (b()).

The double-deref in b := %a% or %a%() will throw an exception if it does not resolve to a valid initialized variable. Guarding %a%() with try-catch isn't a good solution since an error (even the same error) could be thrown either by the call itself, or by the called function. Guarding just the double-deref (b := %a%) gives a clearer picture.

Before, passing function names was often more convenient, which might have outweighed the drawbacks. Now there's much less reason to pass around function names and call them, so less reason for something like IsFunc.

kczx3 · Post by **kczx3** » 18 Feb 2021, 09:14

lexikos wrote:Of course, you could just require the caller to pass an object in the first place

That'd be nice if we could enforce that, wouldn't it

Currently, we can only provide documentation for what to pass, and then validate the argument inside the function/method. So if I am expecting a callback to be passed, I need a way to validate that its callable. Forgive me, but I don't think I follow/understand what the double-deref does that actually validates it. Maybe this is above my level of comprehension.

Post by **lexikos** » 19 Feb 2021, 07:40

We can enforce it, the same way that built-in functions enforce it, which is just how you described. I think what you're really wishing for is a language feature to let the program enforce it for you. It would be convenient, but a similar level of convenience can be achieved by designing functions to perform validation, and having your functions pass their parameters through those instead of performing type checks and throwing directly.

My (quoted) point is that if you do not permit function names, you do not need to write any code to validate them; i.e. if !HasMethod(parameter, "Call"), the parameter is invalid regardless of whether it is a function name. My opinion is that you should not permit function names (or handle them in any way). Built-in functions such as SetTimer do not (in this branch).

I don't think I follow/understand what the double-deref does that actually validates it.

I'm not sure what you mean, but I think you have misinterpreted something. Maybe you are referring to this:

To validate, you need to perform the double-deref first (b := %a%).

What I'm saying is that if a contains a function name and you would call it with %a%(), in order to validate it, you need to first evaluate %a% to get the (presumed) function object.

In short, you should avoid calling functions by name, or passing function names anywhere.

kczx3 · Post by **kczx3** » 19 Feb 2021, 08:17

Thank you! That definitely helps and I appreciate your time to step through that for me

A language construct would certainly make validation easier. I work in both PHP and JavaScript a lot so I do have a sense of both types of languages that do and don't have such constructs. I prefer to keep AHK as flexible as possible, while remaining as intuitive as possible.

sirksel · Post by **sirksel** » 04 Mar 2021, 12:15

@lexikos, this is great news! I'm trying to test it now, so I understand it better. The following code, which I thought should fail for overwriting the function's constant var name, isn't failing like I thought it would. Can anyone help me understand what I'm doing wrong?

Code: Select all

m(x) => msgbox(x)
m := 25   ;shouldn't this fail?
m(m)

I used the zip at the top of the OP (reporting itself as a124-b53c1e78). Is that still the right version to be testing?

Post by **lexikos** » 04 Mar 2021, 22:04

That should fail, unless the function and assignment are in different scopes.

sirksel · Post by **sirksel** » 05 Mar 2021, 00:45

Thanks for the quick reply. Yes, these are in the same scope and this is the entirety of the script. They're all globals, but it's giving me the message box containing 25. Are you all seeing the same? It's entirely possible that I'm just not using the right build or something.

Post by **lexikos** » 05 Mar 2021, 01:45

That can't possibly work with the 'x' branch. I pasted in the wrong link when I released a124 and simultaneously updated the test build. Actual alpha releases are "v2.0-a<number>-<commit hash>". Test releases are "<whatever the last tag was>-<number of commits since>-g<commit hash>".

I have made more changes locally, but haven't pushed another build up yet because I was considering just merging it into the alpha branch.

The link has been fixed.

sirksel · Post by **sirksel** » 05 Mar 2021, 02:06

Thanks @lexikos. I'll wait then to see it in the alpha. That will make me feel (if only slightly) more secure as I start getting to work on modifying 10k+ lines of code. I know its still alpha though. I'm just pretty jazzed about what these changes enable. A couple questions I haven't been able to test:

1. String/Integer/Float are now created. I used to create these to extend the prototypes and for membership testing. Will I still need to create Any/Primitive to extend and test membership, or have you created those too?

2. Method/Property merge. I'm a little confused on this one. Does that mean that obj.prop[param] to call reverts to obj.prop(param)? If so and all params are optional, can we still call as obj.prop without empty parens?

3. Since constants are now a bigger part of things, is there any chance we could have the ability to create them? Unintentionally changed quasi-constants is one of my most frequent debug issues.

What you've done here is really awesome! I read your spec three times to make sure I got it all straight. Seems like the biggest change since eliminating commands or adding fat arrow. Thanks so much for all the time you put into this. After years of thought and discussion, you always get us to an even better place than I imagined would be possible...

Post by **lexikos** » 05 Mar 2021, 03:44

1. Any and Primitive were added at the same time as String/Integer/Float. See https://lexikos.github.io/v2/docs/objects/

2. No. obj.prop[param] and obj.prop(param) are still semantically different. As implied by "once again", it's much the same as in v1 and earlier alphas which did not have methods and properties combined. It means that you can assign a property to create a method, and cannot have a property and a method with the same name. However, now a property can have a "call" access function in addition to "get" and "set" (but it isn't integrated into the class syntax), whereas to achieve something like that before you had to use meta-functions. Also, obj.param can be an object with both __Item[] and Call().

3. Yes. I have been thinking about how that will work (semantics and implementation details, but you don't need to hear the latter). The value should be decided when execution reaches the initializer, which would be an arbitrary expression the same as with local/global. I'm not sure what should happen when a const initializer is encountered in a loop. It would probably be more useful than harmful to permit reassignment by the singular line which declares it, since we don't have block-scoped variables. Otherwise, it could throw an error or behave like static (being evaluated only once).

Note that if your quasi-constants are created in an outer scope (such as global), assigning to them inside a function would require declaring them.

Code: Select all

QUASI_CONSTANT := 42
fn1() {
    QUASI_CONSTANT := 1
    MsgBox QUASI_CONSTANT  ; 1
}
fn2() {
    MsgBox QUASI_CONSTANT  ; 42  (invalid in v2.0-a127)
}
fn1(), fn2()
MsgBox QUASI_CONSTANT  ; 42

sirksel · Post by **sirksel** » 05 Mar 2021, 04:47

Thanks. It all sounds great. That sample code helps a lot. I know you said super-globals aren't as necessary, due to the following:

Within a function, if a given variable name is not used as the target of an assignment or reference (&) operator inside the function, it may resolve to an existing global variable even without declaration.

Is the comment above only applicable to resolution of globals proper or to any vars in an enclosing scope? Seems like the latter would be more elegant, but I'm sure you considered it and have good reasons for whichever you chose. I also wasn't sure from your notes if super-globals are merely not as necessary, or if they're actually deprecated or removed?

Thanks for all the clarifications. They will make for a much smoother refactoring fiesta this weekend!

Post by **lexikos** » 05 Mar 2021, 06:55

Super-global variables are still present in the currently available build, but not in my latest build.

Local variables of outer functions were and are accessible to inner functions without declaration; there is still no way to declare that a variable should come from an outer scope (excluding global, which declares that a variable should come from global scope). Whether the inner function assigns to the variable still does not (or should not) affect whether it is linked to the outer variable.

However, there are some inconsistencies in the currently available build when the outer function reads but does not assign or declare a variable. Whether the variable is local and/or shared by inner functions may depend on whether a global exists, the position of the inner function relative to references in the outer, and whether the inner assigns to the variable. You can avoid the inconsistencies by including a declaration, assignment or &ref in the outer function; in future, without at least one of these, inner functions will not capture the variable.

sirksel · Post by **sirksel** » 05 Mar 2021, 11:47

That helps a lot. So, on this point though...

...by including a declaration, assignment or &ref in the outer function; in future, without at least one of these, inner functions will not capture the variable

This doesn't mean you have to (or will have to in the foreseeable future) re-declare a global in a either outer or successive levels of inner functions to have it available for reading by the innermost functions. In other words, once a global is declared, assuming no shadowing along the way, it's available at all levels for reading without any further declaration. Correct?

(This does make a difference to this weekend's refactoring fiesta, as I have 100+ global environment-like vars and/or quasi-constants read at all levels of my code without further declaration...)

AutoHotkey Community