I have long-term plans to restructure the parser and give both the script itself and external applications a clean interface for parsing, loading and executing scripts as modules. However, that's a lot of work, and I think this idea could be implemented in an unofficial capacity much more easily, allowing for some very useful tools to be created much sooner. I'm also about done with v1, so if this idea depended on much work being done, it would never be implemented for v1. Yesterday I was looking for an interesting side-project, so I started brainstorming how to hack up an unofficial solution for this with minimum effort and maximum effect.
At first I considered the initial idea presented here, of AutoHotkey.exe dumping information to stdout. I didn't enjoy the thought of writing code to deal with JSON or XML, and that also led me back to other features that this idea could depend on, like built-in JSON support which I would instead feed objects into. I concluded that the output doesn't need to have much of a hierarchy for a script to parse it easily enough; and I suppose I could leave the parsing up to you... but then I considered the next idea:
maybe it could be a DLL, then it could be called with DLLCall and return the object directly?
To allow a script to introspect over itself (an idea for the future), an object-based API is ideal. Objects already support IDispatch, so can easily be exposed to external code. The program already has one object that provides information about an element of the script; specifically Func. Some additional objects would need to be implemented, but even if this is unofficial at the moment and not built upon an ideal foundation, some of the implementation could potentially be reused for official functionality later.
So the first step of making
a DLL that you can use to parse scripts is to make AutoHotkey compile as a DLL. This is literally just a case of changing a few project and linker settings, but the result is a DLL that initializes all of the global static C++ objects and does little else.
To make it do something, the easiest next step is to export a function for calling the program's main entry point (_tWinMain). A few minor changes are needed to make the entry point work, since it normally takes command line parameters from the C runtime's
__argc and
__argv, which don't appear to be initialized sensibly in the DLL.
The result is a DLL that can be loaded by AutoHotkey and then called to execute another instance of AutoHotkey in the same process and thread. For instance, using just a few lines of script code, an AutoHotkey v2 process can load AutoHotkey v1, retrieve a function or class as a COM object and call it. This is currently much simpler than
my attempt at using AutoHotkey.dll for that purpose and
thqby's import_v1lib, and avoids some issues.
Unlike HotKeyIt's AutoHotkey.dll, this is almost entirely vanilla AutoHotkey source code, with very minimal changes. I am going to end up keeping a DLL with very basic functionality in the main branch, although likely as unofficial/unsupported and not part of the main download.
Initially I was thinking to add in a litter of preprocessing directives to strip out unused functionality (i.e. everything except parsing) to reduce the size of the DLL. I was quite happy to realize I've accidentally created something more useful than I intended, while simultaneously eliminating a bunch of tedious work.
(However, I may have fallen into a trap of
scope creep.)
Anyway, that was fun, but I haven't actually gotten to the "meat" of your request yet. I am thinking that I will implement the following within the next few weeks and then upload something for you to test:
List of variables
list of functions and parameters
list of labels
list of objects
warnings
errors
Providing the host with a way to set a callback function for errors and warnings seemed to be the most trivial way to implement that, so I've done it. The callback can catch errors and warnings both at parse-time and at run-time. The line and
.What associated with a load-time warning or error might be irrelevant in some cases, but that can be improved down the track.
I am still considering how I will implement the rest. For instance, a method with usage like
funcs := [], hostedScript.EnumFuncs(f => funcs.Push(f)) would have a very simple implementation (
hostedScripts takes an IDispatch* and invokes it once for each function), but there are other approaches that could create more reusable components for other ideas.
toralf wrote:A) a hierarchical structure of all the files involved (include and lib files)
- with absolute or relative path to the main script
AutoHotkey keeps a flat list of files, but a minimal addition would allow the hierarchy to be reproduced on demand. All stored paths are absolute.
- for each line of each file the level of indentation based on code not on real/current indentation
The level of nesting can be calculated for each executable line, which may be composed of multiple physical lines due to line continuation. Executable lines basically includes control flow statements, commands, expressions and braces (excluding the outer braces of a class definition). Only the
starting line number of each virtual line is retained.
Else IfWinExist xxx,, MsgBox would be represented as three virtual lines (or three
Lines), all with the same line number.
-- the position at which position a comment starts
Some information could be dumped as each line is parsed, but comments are stripped out as each line is read from file, and currently no information about them is retained. Because comments are processed at such an early stage, interpretation of
; can't be affected by quote marks in an expression (
" ; " always contains a comment, although
"; " does not, because there must be a space before the semicolon). Fortunately this makes it reasonably trivial for a script to reproduce the parsing of comments, although you would need to account for continuation sections, as they affect the interpretation of
; and
/*.
-- an indication of what type the line is, e.g. pure comment, block comment, continuation line, func def, class def, etc.
As with comments, continuation is processed at a very early stage. The linefeed characters are usually stripped out or replaced for continuation lines (and in continuation sections with the
Join option), and the original text and character positions are not retained. I'm not sure that it would be feasible (or trivial enough for me to implement as part of this project) to determine the character positions for all of these elements, even with additional information being recorded. Perhaps if we pretend continuation sections don't exist.
Class definitions are also not retained in a way that ties back to lines of code, except for executable lines such as variable initializers or lines in the body of a method.
Directives have an immediate effect when parsed (even if just to set a variable) and are not retained.
A possible solution may be for the host to provide callbacks for when these elements are identified during parsing. The host can then record the information if needed.
- for each warning and the one error ...
-- the position in the line
This information is not available at any point. Even for errors at early stages of parsing, the code being parsed has been already preprocessed to remove indentation/comments and combine continuation lines. Some errors are raised after escape sequences are translated, or after the line has been split into multiple "args" with whitespace trimmed (and if the line represents a command or control flow statement, its name has been replaced with a number).
- a list of all variables Names of all files (built in variables could be maintained within the tool)
It would be trivial to allow enumeration of built-in variables and functions, although there is no information about parameter names.
-- an indication if they are super global, global or local
-- an indication if they are a class
Aside from local/static variables being listed within the corresponding function, each variable also has the following scope attributes: global, local, function parameter, static, declared, super-global (v1). A variable can be normal, alias (at run-time), virtual (read-only or read-write, executing a function when accessed), or constant (v2). Function parameters also have the ByRef flag and potentially a default value.
A variable is not a class, but may contain one. Identifying whether a variable contains a class is simple.
Before the script executes, a variable contains a class only if it was assigned by the class definition. One can also compare __Class against the name of the variable, or in v2, whether the variable is a constant. Nested classes are a bit trickier (in v2).
- a list of functions with their parameters of all files (built in functions could be maintained within the tool)
It is easier to enumerate all functions per scope (global or nested within a given function) than to divide them by file. Since there is already a Func object, I am thinking to add unofficial properties/methods to retrieve the parameter names and default values, and the file and line of the function definition. (The line of the function definition isn't currently recorded, but it could be.) Built-in functions can be enumerated as well, but their parameters have no names. (
Most v2 functions are now implemented using metadata that includes parameter names within macros in the source code, but the names are currently discarded by the C preprocessor.)
- a list of classes and their hierarchy of methods (with parameters) and properties
Each class can be retrieved from its variable. Methods and properties can be enumerated in the normal fashion, specific to the AutoHotkey version; which might be insufficient for v1.
-- For each the file and line number (and position) they occur
Where what occurs? Each and every reference to a variable? The first reference to a variable? The first assignment or declaration? This information is not retained. A possible solution would be to provide a callback for when a new variable is created during parsing. For v2, a variable would typically be created when its first
assignment is encountered; for v1 it would likely be the first reference of any kind.
-- For each variable in which labels, functions, classes, methoden they are used
Variables are either global or local. Local variables belong to a function. A method is a function which belongs to a class. You would discover variables by enumerating the variables of a specific function, so you don't need to be told which function it belongs to.
Nothing is used
in a label, only above or below it. A label is just a point in the code, and sometimes neither the program nor the user can define what region of code actually relates to the label. This is part of the reason I removed label-based subroutines in v2.
TheArkive wrote:Since there is no "list of objects" Is it possible to add checking (maybe reference checking) for members (methods and properties) associated with a particular var, to extend the var list into a possible object list?
It's a bit late for me to respond to this, but note that I already stated that classes and their contents could be enumerated via the variable list. Also, members aren't associated with a
var; members, or more accurately
properties, belong to an object. For v2, you do not need to inspect the
methods and properties of an object to determine its type. In this case, the API might need to provide a
type function, as
type(obj) won't work across the COM boundary, and
obj.__Class carries the risk of invoking a custom property or meta-function.