use AHK.exe to parse scripts

Propose new features and changes
toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

use AHK.exe to parse scripts

Post by toralf » 07 Jun 2020, 06:36

Several people have coded AHK scripts in the past that parse other AHK scripts to get information from it to support programming.
This can get quite complicated depending on what type of information should be extracted from the script.
I was wondering, if the parser in AHK.exe itself could be used to extract information of a given script?
I would imagine a call of AHK.exe on the command line with a special command line parameter (e.g. \ANALYSE) and a script name as input. Then AHK.exe would not execute the script, but only parse the script and return "some" information on the script.
This information could be
  • List of variables
  • list of functions and parameters
  • list of labels
  • list of objects
  • warnings
  • errors
  • etc
Is this even feasible?

With this information calltips, auto-completion, syntax tidy or similar things would be much easier.
ciao
toralf

lexikos
Posts: 9553
Joined: 30 Sep 2013, 04:07
Contact:

Re: use AHK.exe to parse scripts

Post by lexikos » 07 Jun 2020, 23:44

Yes, except for:
  • List of objects. Maybe you meant classes? There is no internal list of classes, just the list of global variables that includes variables containing class objects. I suppose that those could be listed separately, and their contents could be enumerated to list nested classes, methods and whatever else is available without running the script.
  • Errors, plural. AutoHotkey isn't designed to keep parsing after an error, so you'd only get the first error.
  • etc (unspecified things might not be feasible). ;)
What specific format should the information be in?

It should be possible to include local variables by function, hotkey variants by context, and so on. Everything that is used to execute the script is perhaps trivial to enumerate with C++ and dump to stdout or a file. Code size might be an issue, in which case I'd be inclined to offer this capability only as a separate executable, which could exclude a lot of code, like the implementation of each built-in function. The trick would be to do that without littering the code with conditional compilation directives.

toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: use AHK.exe to parse scripts

Post by toralf » 08 Jun 2020, 02:07

Thank you for your reply and consideration.

Regarding the output and exchange of information, I have no real concept for it right now. There are some basic requirements and random thoughts:
- it should be easy to access and use by an AHK script
- it will be structured information, thus an object would most likely be the right final entity.
- to end up with an object, I assume that an output on stdout or file should have an xml or JSON format.
- if it is an extra executable, maybe it could be a DLL, then it could be called with DLLCall and return the object directly?

I’ll continue at a later point in time
ciao
toralf

lexikos
Posts: 9553
Joined: 30 Sep 2013, 04:07
Contact:

Re: use AHK.exe to parse scripts

Post by lexikos » 08 Jun 2020, 03:35

Which AutoHotkey version(s) are you concerned with?

toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: use AHK.exe to parse scripts

Post by toralf » 08 Jun 2020, 09:43

In a Perfect world v1 and v2. But if it has a big impact v2 would I guess be the only one practical.
I still use v1.
ciao
toralf

User avatar
kczx3
Posts: 1640
Joined: 06 Oct 2015, 21:39

Re: use AHK.exe to parse scripts

Post by kczx3 » 08 Jun 2020, 19:14

I agree this would be awesome to have. JSON or xml are ideal formats even though we must rely on third party libs for JSON. I’d say v2 unless it isn’t much work to also do v1.

toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: use AHK.exe to parse scripts

Post by toralf » 08 Jun 2020, 23:18

More random thoughts.
This might not be complete yet, but I saw in the past that you have a good sense of spotting corner cases and unspecified areas. ;)

The information provided should contain
A) a hierarchical structure of all the files involved (include and lib files)
- with absolute or relative path to the main script
B) for a tool that would help to tidy the indentation
- for each line of each file the level of indentation based on code not on real/current indentation
- nice to have
-- the position at which position a comment starts
-- an indication of what type the line is, e.g. pure comment, block comment, continuation line, func def, class def, etc.
C) for a too that would help to eliminate “bad code”
- for each warning and the one error the file and line number and a description on what is wrong.
- nice to have
-- the position in the line
D) for a tool to support auto completion
- a list of all variables Names of all files (built in variables could be maintained within the tool)
-- an indication if they are super global, global or local
-- an indication if they are a class
- a list of functions with their parameters of all files (built in functions could be maintained within the tool)
- a list of classes and their hierarchy of methods (with parameters) and properties
- Nice to have
-- For each the file and line number (and position) they occur
-- For each variable in which labels, functions, classes, methoden they are used
E) for a tool to provide calltips / intellisence
- ...

use cases:
in all cases the word tool, means a tool (e.g. an ahk script) that uses AHK.exe to get the information on the file and that does the job for the user or interacts with the user)
B) Tidy: user selects some lines of code in its editor and asks a tool to do the indentation. Since the tool knows for each line the level of indention it can add indentation according to the indentation type defined by the user. If it knows the type of line, it should be able to also create different styles of indentation.
C) Bad Code:
-- 1) a tool provides a list of warning and the error. User selects one and the cursor jumps in their editor to the right location in their code and if possible selects the portion of the code that creates the issue.
-- 2) an editor shows with custom highlighting the portion of code with issues and provides a tooltip on them on what the warning is.
D) Autocomplete:
-- 1) User types part of a function name, the tool provides a list of functions available with their parameters, and since the tool knows on which line the functions are defined, it could check if the lines above or below the definition are comments (see B) and check them if they contain additional information for the function.
-- 2) User types part of a variable name, the tool provides a list of variables available including where they are defined and if they are super global/global/local. Then the user might spot that he needs to set a variable global to be able to access it.
-- 3) User has selected a variable that contains a class. The tool provides a list of methods and properties of that class.
E) Calltips/Intellisence:
-- ...
ciao
toralf

User avatar
TheArkive
Posts: 1027
Joined: 05 Aug 2016, 08:06
Location: The Construct
Contact:

Re: use AHK.exe to parse scripts

Post by TheArkive » 09 Jun 2020, 03:34

This is an amazing idea, and would cut well over 1000 lines from my CallTipsForAll script. Some more ideas:

Since there is no "list of objects" Is it possible to add checking (maybe reference checking) for members (methods and properties) associated with a particular var, to extend the var list into a possible object list?

If a list of super global vars is possible, and if it is possible to check a var's association with data members, then one could cross reference to find out which elements are classes, and which are objects.

I have ideas for structure and output, but that also depends on the extent of the capabilities.

lexikos
Posts: 9553
Joined: 30 Sep 2013, 04:07
Contact:

Re: use AHK.exe to parse scripts

Post by lexikos » 04 Dec 2022, 03:03

I have long-term plans to restructure the parser and give both the script itself and external applications a clean interface for parsing, loading and executing scripts as modules. However, that's a lot of work, and I think this idea could be implemented in an unofficial capacity much more easily, allowing for some very useful tools to be created much sooner. I'm also about done with v1, so if this idea depended on much work being done, it would never be implemented for v1. Yesterday I was looking for an interesting side-project, so I started brainstorming how to hack up an unofficial solution for this with minimum effort and maximum effect.

At first I considered the initial idea presented here, of AutoHotkey.exe dumping information to stdout. I didn't enjoy the thought of writing code to deal with JSON or XML, and that also led me back to other features that this idea could depend on, like built-in JSON support which I would instead feed objects into. I concluded that the output doesn't need to have much of a hierarchy for a script to parse it easily enough; and I suppose I could leave the parsing up to you... but then I considered the next idea:
maybe it could be a DLL, then it could be called with DLLCall and return the object directly?
To allow a script to introspect over itself (an idea for the future), an object-based API is ideal. Objects already support IDispatch, so can easily be exposed to external code. The program already has one object that provides information about an element of the script; specifically Func. Some additional objects would need to be implemented, but even if this is unofficial at the moment and not built upon an ideal foundation, some of the implementation could potentially be reused for official functionality later.

So the first step of making a DLL that you can use to parse scripts is to make AutoHotkey compile as a DLL. This is literally just a case of changing a few project and linker settings, but the result is a DLL that initializes all of the global static C++ objects and does little else.

To make it do something, the easiest next step is to export a function for calling the program's main entry point (_tWinMain). A few minor changes are needed to make the entry point work, since it normally takes command line parameters from the C runtime's __argc and __argv, which don't appear to be initialized sensibly in the DLL.

The result is a DLL that can be loaded by AutoHotkey and then called to execute another instance of AutoHotkey in the same process and thread. For instance, using just a few lines of script code, an AutoHotkey v2 process can load AutoHotkey v1, retrieve a function or class as a COM object and call it. This is currently much simpler than my attempt at using AutoHotkey.dll for that purpose and thqby's import_v1lib, and avoids some issues.

Unlike HotKeyIt's AutoHotkey.dll, this is almost entirely vanilla AutoHotkey source code, with very minimal changes. I am going to end up keeping a DLL with very basic functionality in the main branch, although likely as unofficial/unsupported and not part of the main download.

Initially I was thinking to add in a litter of preprocessing directives to strip out unused functionality (i.e. everything except parsing) to reduce the size of the DLL. I was quite happy to realize I've accidentally created something more useful than I intended, while simultaneously eliminating a bunch of tedious work. :D (However, I may have fallen into a trap of scope creep.)

Anyway, that was fun, but I haven't actually gotten to the "meat" of your request yet. I am thinking that I will implement the following within the next few weeks and then upload something for you to test:
List of variables
list of functions and parameters
list of labels
list of objects
warnings
errors
Providing the host with a way to set a callback function for errors and warnings seemed to be the most trivial way to implement that, so I've done it. The callback can catch errors and warnings both at parse-time and at run-time. The line and .What associated with a load-time warning or error might be irrelevant in some cases, but that can be improved down the track.

I am still considering how I will implement the rest. For instance, a method with usage like funcs := [], hostedScript.EnumFuncs(f => funcs.Push(f)) would have a very simple implementation (hostedScripts takes an IDispatch* and invokes it once for each function), but there are other approaches that could create more reusable components for other ideas.

toralf wrote:A) a hierarchical structure of all the files involved (include and lib files)
- with absolute or relative path to the main script
AutoHotkey keeps a flat list of files, but a minimal addition would allow the hierarchy to be reproduced on demand. All stored paths are absolute.
- for each line of each file the level of indentation based on code not on real/current indentation
The level of nesting can be calculated for each executable line, which may be composed of multiple physical lines due to line continuation. Executable lines basically includes control flow statements, commands, expressions and braces (excluding the outer braces of a class definition). Only the starting line number of each virtual line is retained. Else IfWinExist xxx,, MsgBox would be represented as three virtual lines (or three Lines), all with the same line number.
-- the position at which position a comment starts
Some information could be dumped as each line is parsed, but comments are stripped out as each line is read from file, and currently no information about them is retained. Because comments are processed at such an early stage, interpretation of ; can't be affected by quote marks in an expression (" ; " always contains a comment, although "; " does not, because there must be a space before the semicolon). Fortunately this makes it reasonably trivial for a script to reproduce the parsing of comments, although you would need to account for continuation sections, as they affect the interpretation of ; and /*.
-- an indication of what type the line is, e.g. pure comment, block comment, continuation line, func def, class def, etc.
As with comments, continuation is processed at a very early stage. The linefeed characters are usually stripped out or replaced for continuation lines (and in continuation sections with the Join option), and the original text and character positions are not retained. I'm not sure that it would be feasible (or trivial enough for me to implement as part of this project) to determine the character positions for all of these elements, even with additional information being recorded. Perhaps if we pretend continuation sections don't exist.

Class definitions are also not retained in a way that ties back to lines of code, except for executable lines such as variable initializers or lines in the body of a method.

Directives have an immediate effect when parsed (even if just to set a variable) and are not retained.

A possible solution may be for the host to provide callbacks for when these elements are identified during parsing. The host can then record the information if needed.
- for each warning and the one error ...
-- the position in the line
This information is not available at any point. Even for errors at early stages of parsing, the code being parsed has been already preprocessed to remove indentation/comments and combine continuation lines. Some errors are raised after escape sequences are translated, or after the line has been split into multiple "args" with whitespace trimmed (and if the line represents a command or control flow statement, its name has been replaced with a number).
- a list of all variables Names of all files (built in variables could be maintained within the tool)
It would be trivial to allow enumeration of built-in variables and functions, although there is no information about parameter names.
-- an indication if they are super global, global or local
-- an indication if they are a class
Aside from local/static variables being listed within the corresponding function, each variable also has the following scope attributes: global, local, function parameter, static, declared, super-global (v1). A variable can be normal, alias (at run-time), virtual (read-only or read-write, executing a function when accessed), or constant (v2). Function parameters also have the ByRef flag and potentially a default value.

A variable is not a class, but may contain one. Identifying whether a variable contains a class is simple. Before the script executes, a variable contains a class only if it was assigned by the class definition. One can also compare __Class against the name of the variable, or in v2, whether the variable is a constant. Nested classes are a bit trickier (in v2).
- a list of functions with their parameters of all files (built in functions could be maintained within the tool)
It is easier to enumerate all functions per scope (global or nested within a given function) than to divide them by file. Since there is already a Func object, I am thinking to add unofficial properties/methods to retrieve the parameter names and default values, and the file and line of the function definition. (The line of the function definition isn't currently recorded, but it could be.) Built-in functions can be enumerated as well, but their parameters have no names. (Most v2 functions are now implemented using metadata that includes parameter names within macros in the source code, but the names are currently discarded by the C preprocessor.)
- a list of classes and their hierarchy of methods (with parameters) and properties
Each class can be retrieved from its variable. Methods and properties can be enumerated in the normal fashion, specific to the AutoHotkey version; which might be insufficient for v1.
-- For each the file and line number (and position) they occur
Where what occurs? Each and every reference to a variable? The first reference to a variable? The first assignment or declaration? This information is not retained. A possible solution would be to provide a callback for when a new variable is created during parsing. For v2, a variable would typically be created when its first assignment is encountered; for v1 it would likely be the first reference of any kind.
-- For each variable in which labels, functions, classes, methoden they are used
Variables are either global or local. Local variables belong to a function. A method is a function which belongs to a class. You would discover variables by enumerating the variables of a specific function, so you don't need to be told which function it belongs to.

Nothing is used in a label, only above or below it. A label is just a point in the code, and sometimes neither the program nor the user can define what region of code actually relates to the label. This is part of the reason I removed label-based subroutines in v2.

TheArkive wrote:Since there is no "list of objects" Is it possible to add checking (maybe reference checking) for members (methods and properties) associated with a particular var, to extend the var list into a possible object list?
It's a bit late for me to respond to this, but note that I already stated that classes and their contents could be enumerated via the variable list. Also, members aren't associated with a var; members, or more accurately properties, belong to an object. For v2, you do not need to inspect the methods and properties of an object to determine its type. In this case, the API might need to provide a type function, as type(obj) won't work across the COM boundary, and obj.__Class carries the risk of invoking a custom property or meta-function.

toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: use AHK.exe to parse scripts

Post by toralf » 04 Dec 2022, 09:27

Dear Lexikos,

thanks for getting back to this 2 year old topic. You have put a lot of thought and effort into this already. It sounds promising.
At the same time i do not see how it will help in the use cases mentioned above. Mainly because a lot of information on line number, position in line, etc. is not retained. Hence the "tools" using this dll will have a hard time to direct the user to the position in the original code files.

But I guess there will be other use cases.
ciao
toralf

lexikos
Posts: 9553
Joined: 30 Sep 2013, 04:07
Contact:

Re: use AHK.exe to parse scripts

Post by lexikos » 04 Dec 2022, 22:53

Most of the time I spent on this was writing up that post. :)

I had concluded that the current parser is not suited for static analysis or conversion/refractoring tools, especially if the script contains an error (depending on the stage at which the error is detected).

However, aside from the parser stopping due to an error, the "possible solution" I mentioned is quite trivial. Which use cases in particular are you thinking will be a problem?

Variable and function lists for calltips, autocomplete and semantic highlighting seem like no problem.

toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: use AHK.exe to parse scripts

Post by toralf » 10 Dec 2022, 10:34

Dear Lexikos,
One use case was to create a new Tidy version, that formates a code nicely by indentation. but since a lot of infomation is due to continuation lines and comments removed before the parser even starts working, i do not see how this could be achieved.
A second use case would be a context menu on functions, that allows to jump to the line of code (maybe even in a different file) where the function is defined or all places this function is called.
As far as i understood the "possible solution" can not tell on which line and file a function is called or defined. the parser only knows that this function exists.
ciao
toralf

lexikos
Posts: 9553
Joined: 30 Sep 2013, 04:07
Contact:

Re: use AHK.exe to parse scripts

Post by lexikos » 10 Dec 2022, 19:46

I think that invoking the interpreter would be useful for things that require semantic analysis, like identifying the bounds of a function and the scope of each name, for smarter autocomplete or semantic highlighting. Semantic information is needed by the program in order to execute the script, whereas comments, whitespace and the exact format of the unaltered source code are not.

Refactoring tools need more fine-grained information about the location of each element within the unaltered source code, which poses some difficulty due to the multi-phase nature of the continuation syntax. Tools can choose to be inaccurate, such as by treating continuation sections as though they nest inside a string or expression, and not supporting certain pathological cases.

Conversion tools presumably still want to retain comments and such, but the developer could choose how much of the original structure to retain. Continuation sections differ between versions, so there's not much practical need to retain them as is. The converter can be designed to merge continuation lines prior to conversion and split them out if desired afterward.

I do not think that the easiest answer is to overload the existing interpreter with this functionality. It can tell you as it encounters each comment, for instance, but you could just as easily parse the line yourself and identify where the comment is. The difficult part is not identifying comments or parsing and merging continuation lines/sections, but reconciling references in the processed code with original character positions in the unaltered source; or determining what final code to produce.

I don't feel that the Tidy use case benefits much from utilizing the actual interpreter. I would think that it works mostly with the basic "physical structure" of the code and doesn't care much about the semantic structure.
As far as i understood the "possible solution" can not tell on which line and file a function is called or defined. the parser only knows that this function exists.
Again, that was...
I wrote:A possible solution may be for the host to provide callbacks for when these elements are identified during parsing. The host can then record the information if needed.
The issue was that the interpreter does not retain certain information that it doesn't need for later initialization phases, execution or error-reporting, not that never knows where it encountered them.

When the parser encounters a function definition or variable, or even when it strips out comments or merges continuation lines, it knows which line it is parsing right now. The callbacks would receive that information and could record whatever is needed.

I was too vague about what isn't retained in the current version:
  • For any given variable or function, the line at which it was first referenced.
  • The line at which a class was defined.
What is retained:
  • Every line of executable code, and every function and variable reference within that code. (It has to be executed, after all.)
  • The starting line of every function's body, including methods and property getters/setters.
If you want to know where every non-dynamic reference to a function or variable is, that information is available: we can iterate over each line, and enumerate the references within that line. You cannot ask a variable or function where it was first referenced, but you can work it out by searching the entire script.

In v2, a static __Init method is automatically constructed for each new class. Every script function is required to have an opening brace Line and closing brace Line, and the line number at which the __Init method was constructed is stored in both. Even if it wasn't, I would be inclined to make the DLL store this information if doing so was trivial and useful.

User avatar
andymbody
Posts: 867
Joined: 02 Jul 2017, 23:47

Re: use AHK.exe to parse scripts

Post by andymbody » 10 Dec 2022, 23:02

@lexikos
So, am I to understand that there is currently no parser that you know of that will do what @toralf is asking for?

I have been working on a generic AHK parser myself (it's not nearly ready) and want to make sure that I am not wasting my time reinventing the wheel. My intention was to create a tool that would "fix" downloaded scripts automatically, by converting old deprecated code, formatting the many possible syntax's into one that I (the user) prefers, creating a list of all user defined variables, functions, classes, etc. This is just the short list of what it will eventually do.

Is there such a tool, that you know of already, for AHK scripts?

Thanks!
Andy

lexikos
Posts: 9553
Joined: 30 Sep 2013, 04:07
Contact:

Re: use AHK.exe to parse scripts

Post by lexikos » 11 Dec 2022, 00:53

@andymbody That had not been asked or discussed in this topic, which is about using AutoHotkey's own parser.

AutoHotkey2-lsp does quite in depth parsing and analysis, but it is implemented in (I think) TypeScript and for v2 only. It can be used without vscode, but do not ask me how.

Others have made attempts at creating parsers for various reasons, and converters, with varying degrees of success; none that I would consider complete or accurate.

User avatar
andymbody
Posts: 867
Joined: 02 Jul 2017, 23:47

Re: use AHK.exe to parse scripts

Post by andymbody » 11 Dec 2022, 06:33

@toralf
@lexikos
lexikos wrote:
11 Dec 2022, 00:53
That had not been asked or discussed in this topic
Sorry, I didn't intend to hijack the thread... I had simply skimmed over the topic and replies prior to replying myself. After fully re-reading the original question and replies, I see that the topic was more specific as you mentioned. I apologize for jumping into the middle of the conversation like that... please forgive my "rudeness".

Thank you for the info
Andy

lexikos
Posts: 9553
Joined: 30 Sep 2013, 04:07
Contact:

Re: use AHK.exe to parse scripts

Post by lexikos » 26 Dec 2022, 06:16

I've uploaded dll files based on the v1.1 (8f83322) and alpha (df84a3e) branches.

AutoHotkeyLib_20221226.zip

See README-LIB.md for details.

I haven't decided which forum to post it in...

toralf
Posts: 868
Joined: 27 Apr 2014, 21:08
Location: Germany

Re: use AHK.exe to parse scripts

Post by toralf » 01 Jan 2023, 11:20

Dear Lexikos,
Hope you had a great chistmas time. All the best for 2023.
Thanks a lot for the dlls. I'll test how i can make use of them.
ciao
toralf

Post Reply

Return to “Wish List”