Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Regular Expressions (RegEx) for AutoHotkey


  • Please log in to reply
112 replies to this topic

Poll: What should the names of the RegEx functions be (if you HAD to pick one of these)? (42 member(s) have cast votes)

What should the names of the RegEx functions be (if you HAD to pick one of these)?

  1. RegExMatch() and RegExReplace() (43 votes [84.31%])

    Percentage of vote: 84.31%

  2. RegMatch() and RegReplace() (8 votes [15.69%])

    Percentage of vote: 15.69%

Vote Guests cannot vote
PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

I may have over a hundred substitutions to test/perform on a file. I don't know if this would be a consideration for caching (mentioned by PhiLho).

The issue is the classical "one function call do the open, process, close" stuff, as seen in FileReadLine.
In case of REs, it would be "compile the RE, then use it". In a loop, without cache, each call of InStrRE (or RegExMatch, etc.) would first compile the same RE over and over. Even if it is a fast process for simple REs, it cumulates. So a simple cache would associate a RE with its compiled form, and when asking it again, it just uses the compiled form.
But a large cache takes time to scan (or will need to compute a hash, etc.) so the gain might be lost.
You can use hundred of regexes, but if it is just a collection of s/x/y/g, cache won't do anything here. It is interesting only if we use the same RE or some differents regexes in a tight loop, for example.

I agree with JSLover that the simplest, most familiar syntax for substitutions is s///, with the option of using some other delimiter instead of "/".

It is OK if the language accepts this syntax from the start, but it is too much trouble to add it after. Lot of languages dropped this syntax. I kind of like it (when we have choice of delimiter), but I prefer to skip it in AHK.

One issue that will have to be addressed is what standard you use for escaped characters such as new-line and tabs. I was surprised to discover that AutoHotkey uses `n and `t instead of the familiar \n and \t.

I think Chris should stick to \r \n etc., because it is part of the RE syntax, like \w, \s and so on. Beside, it avoids to parse the RE before giving it to PCRE: just give it as is. Since REs are a whole world of its own, with big libraries of ready-to-use expressions, departing from this can be an error, and even if it can look confusing to newbies, on long run it is better. Beside, if we used `n, we would need to double escape chars everywere. We are lucky not to need that, don't make our live hard!

If I understand the Split issue correctly, would you be able to duplicate that functionality using a regexp substitution, by inserting a newline character as part of the replace expression?

No, it would generate an array of strings. Since we usually loop on the result (in lack of true array support, hence of functions of array processing), there is no compelling need for this.

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Replace feature: facts and ideas

Fact 1:
Traditionally, this feature allows to replace the part of string matched by the RE (whole match, main capture) by a given string, once, n times or in all found occurrences.
The replace string can be static or can be built from sub-captures of the match, ie. selected parts of the matched string.

Fact 2:
PCRE, the library chosen by Chris, doesn't support replacing, only matching. This means this part have to be written from scratch, but also we have choice of syntax and implementation.

Fact 3:
A quite common scheme, coming probably from Perl, and used in several languages, is to use the dollar syntax to reference the sub-captures. That is $1 to $9 reference the captures 1 to 9, ie. each time we meet $n in the replace string, we replace it by the corresponding sub-capture.
Historically, there was another scheme, used in Unix tools like sed: \1 to \9, which is the notation of back references in the regex. Note that Perl supported it, but it was dropped.
Note that in Perl, these are just the regular syntax of variable substitutions in strings.
This is the syntax I chose in my signature and as default in my wrapper library, because it is familiar to RE users, but I also shown that we can be much more flexible if we want.
Last note: since Perl allows more than 9 sub-captures, to avoid ambiguities (is $12 the first capture followed by 2 or the twelfth capture?) they use the ${nnn} notation.

Idea 1:
I believe we have two choices: stick to the above notation, pleasing most RE users, or choose an alternate one, that would be familiar to most AHKers.
The alternative would be to use AutoHotkey's natural variable substitutions in strings syntax: %1% to %9% and more.
The advantage is that the syntax is well known and stable: we know how to escape the % if we need a literal one, there is no ambiguity because it is fully enclosed, it is consistent (no $1 vs. ${12}) and short and with a good visibility.
It will even allow to use the named capture features of PCRE.
The idea is to reuse the existing code which parses and expands these variables: AutoHotkey has just to temporarily create these variables (shadowing but not overwriting existing ones, eg. command line parameters), expand them and free them.
With care, we can even expand regular variables in the same shot. I know that this mechanism isn't used in expression strings, but we can make an exception in this case.

Idea 2:
Lua has a rather weak regex support (self-made simple and small engine, but which doesn't support alternatives) but with a powerful and cool feature in the replace function: we can provide a function as replace string.
The given function is given as many parameters as the number of sub-captures, can do whatever processing that is needed, and return the substitution string. In Lua 5.1, returning nil says: "don't make a substitution here".
For example, we can imagine a date matching expression. We can build one that check the validity of the date, but it is very complex. It could be easier to call a function checking this validity: if OK, return a date in another format (eg. short date to long date format), otherwise keep it unchanged or substitute a warning string.
That would be nice, and probably not so hard to do, to have such feature in AutoHotkey. One problem is to indicate we provide a function instead of a plain string. In AHK, we cannot just put the function name, as there can be a variable of same name. We can provide it as string, either with an option indicating its nature, or with an alternate function name (eg. RegExReplace vs. RegExReplaceFunc).
Another difficulty is to find what to return to indicate no substitution. There is no nil nor null value in AutoHotkey. If it is able to see the difference between a naked Return and a Return "" or Return 0, it might do the trick.

John B.
  • Guests
  • Last active:
  • Joined: --
PhiLho wrote:

A quite common scheme, coming probably from Perl, and used in several languages, is to use the dollar syntax to reference the sub-captures. That is $1 to $9 reference the captures 1 to 9, ie. each time we meet $n in the replace string, we replace it by the corresponding sub-capture.
Historically, there was another scheme, used in Unix tools like sed: \1 to \9, which is the notation of back references in the regex. ... Idea 1: I believe we have two choices: stick to the above notation, pleasing most RE users, or choose an alternate one, that would be familiar to most AHKers.

A quick check of the Regular Expression Pocket Reference from O'Reilly indicates the following:
The $n notation is used in Perl, Java,.NET, C#
The \n notation is used in UNIX, UNIX utilities, PHP, JavaScript

I realize I'm in over my head here because I'm new to AutoHotkey. Nonetheless, I would argue strongly for matching a standard RE syntax ($ or \). The benefits to this are:
* First, someone who is familiar with regular expressions does not have to learn a different syntax just for AutoHotkey
* Second, the wealth of regular expression tutorials and other references available on the Internet would also apply to AutoHotkey (particularly if the documentation indicates that AHK uses an RE syntax similar to (Perl | UNIX | ...).
* Third, someone who learns about REs in AutoHotkey will be able to immediately apply their knowledge in other languages.

As to which syntax to choose, I like the UNIX syntax (only because it's what I know). However I think the choice depends more on 1) ease of implementation and 2) which syntax (Perl or UNIX) would be easiest and most useful for AutoHotkey users.

PhiLho wrote:

... a powerful and cool feature in the replace function: we can provide a function as replace string.

I agree that this would be way cool, and could save a lot of time. Writing an RE to validate input may be good for the soul, but can be difficult to do in practice. It could also be a big help when transforming the input as PhiLho explained.

Thanks,
John B.

majkinetor
  • Moderators
  • 4512 posts
  • Last active: Oct 02 2013 02:33 PM
  • Joined: 24 May 2006
Great observation PhiLho

My opinion is that

1.) Use alternate notation for subcaptures, i.e. %1% %2% %3%.....
This is very cool as you don't have limitation and I guess it is easier to implment in AHK then standard. This is not something that will trouble anybody knowing AHK. Its trivial difference.

2.) Lua implementation of replace kick ass. If this is added to AHK language along with 1) this will lead to great power and it reminds me on XLST transformation by its power. You could be able to litteraly translate anything into anything else with code that wil not look like obfuscation.

So, I think it is about time AHK have & operator for functions. This will allow us to use functions in reg exp replace, to use subclassing for advanced automatition and many other things that currently are not possible. This single update will open entire world of options.


Nonetheless, I would argue strongly for matching a standard RE syntax ($ or \). The benefits to this are:
* First, someone who is familiar with regular expressions does not have to learn a different syntax just for AutoHotkey
* Second, the wealth of regular expression tutorials and other references available on the Internet would also apply to AutoHotkey (particularly if the documentation indicates that AHK uses an RE syntax similar to (Perl | UNIX | ...).
* Third, someone who learns about REs in AutoHotkey will be able to immediately apply their knowledge in other languages.

This is all trivial and minor loss as we get much important functionality.
How difficult it is to understand that you have to replace \1 or $1 with %1%.... Also you forget about ${ } notation. So there are alrady 3 different notations and still, there are plenty of articles on the net never mentioning other notations and people still know what do to in their system.

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Your book is wrong or outdated... ;-)
JavaScript:

In the replacement text, the script uses "$1" and "$2" to indicate the results of the corresponding matching parentheses in the regular expression pattern.

PHP:
ression Functions (Perl-Compatible)[/url]":uhasqcl0">

Replacement may contain references of the form \\n or (since PHP 4.0.4) $n, with the latter form being the preferred one. [...] When working with a replacement pattern where a backreference is immediately followed by another number (i.e.: placing a literal number immediately after a matched pattern), you cannot use the familiar \\1 notation for your backreference. \\11, for example, would confuse preg_replace() since it does not know whether you want the \\1 backreference followed by a literal 1, or the \\11 backreference followed by nothing. In this case the solution is to use \${1}1. This creates an isolated $1 backreference, leaving the 1 as a literal.

There is an error (should be ${1}1, as per example lower in the page, not \${1}1) but it shows that the \n notation is prone to problems.

Being nice to AHKers vs. being nice to old timer RE users, I will let Chris choose, but frankly, in either case, I believe nobody will have a problem to adapt.
The $ notation is slightly harder to implement because of $1 vs. ${1} handling, ie. that's actually two syntaxes to manage. I skipped the difficulty in my wrapper, requesting to choose one or the other, but don't mix both...

[EDIT] Having done something else before posting, meanwhile majkinetor expressed basically the same opinion on $1 vs. %1%. We are in phase. ;-)

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Helping in the Reformatting textfiles. topic, I discovered that by default PCRE only considers \n as a newline mark. Older versions allowed only one char (AFAIK), but in 6.7 I saw new options:

PCRE_NEWLINE_CR
PCRE_NEWLINE_LF
PCRE_NEWLINE_CRLF

These options override the default newline definition that was chosen
when PCRE was built. Setting the first or the second specifies that a
newline is indicated by a single character (CR or LF, respectively).
Setting both of them specifies that a newline is indicated by the two-
character CRLF sequence. For convenience, PCRE_NEWLINE_CRLF is defined
to contain both bits. The only time that a line break is relevant when
compiling a pattern is if PCRE_EXTENDED is set, and an unescaped # out-
side a character class is encountered. This indicates a comment that
lasts until after the next newline.

The newline option set at compile time becomes the default that is used
for pcre_exec() and pcre_dfa_exec(), but it can be overridden.

It is important in our Windows universe...
More letters to add to the available options, unless Chris allows some +0x00008000 form (but this one is important enough to have its shortcut).

Chris
  • Administrators
  • 10727 posts
  • Last active:
  • Joined: 02 Mar 2004
Sorry for the late reply. Here are my comments.

[quote name="PhiLho"]Matches: is it a string or a real variable name? ...
Instead of an option, perhaps you can add a suffix and a number, ie. if var is "capture", we get data in capturePosition (or capturePos), caputreLenght (or captureLen), captureString (or captureStr) and the same numbered for sub-captures. Because if we want, for some reason, both string and pos, we would need to do two searches. Now, we might add options to select which names are generated.[/quote]Yes, I thought I might stick to the way PHP does it: have a separate option that says, "I want the positions instead of (or in addition to) the substrings themselves." When that option is in effect, different things would be stored in the array (or two arrays would be created).

[quote name="PhiLho"]I vote for case-sensitive [as the default].[/quote]From what others have said, that seems to be the consensus. Thanks.

[quote name="PhiLho"]Or should we have a way to set default options for next searches?[/quote]It might be best to avoid that because it hurts script maintainability and portability (e.g. it makes copy & paste of script sections more error-prone if you forget what options were in effect).

[quote name="foom"]Omitting //gmsxi will not magically make newbies understand regular expressions better nor will the readability be drastically improved. It will make look simple RegExp's look clearer "\bsomeword[0-9]+\b".
But in case of "/(\+|\-|\*|\/|!|~|&|\||\^|(:|\-|\+|<|>|!)?=)/gi" it's six of one and half a dozen of another. And RegExp's can get very complicated very quickly, meaning such simple RegExp's like the first example will be rare.[/quote][quote name="John B."]I agree with JSLover that the simplest, most familiar syntax for substitutions is s///, with the option of using some other delimiter instead of "/". In this, I'm drawing on my experience with UNIX and UNIX utilities (not Perl).[/quote][quote name="JSLover"]...I like s///g notation...or s@@@g when parsing urls...can you support both options in the regex & a separate param?...they are "regexs" & should be advanced, like regexs are.[/quote]That's a good point, but I think I'd prefer to implement only one approach, at least initially. For one thing, it makes the documentation a lot simpler. As as someone who has learned a lot about RegEx's in the past year, I can tell you that PHP's requirement for delimiters at the beginning and end of RegEx strings was a source of considerable confusion for me (perhaps because it is poorly documented at php.net).

[quote name="PhiLho"][s///] is OK if the language accepts this syntax from the start, but it is too much trouble to add it after. Lot of languages dropped this syntax. I kind of like it (when we have choice of delimiter), but I prefer to skip it in AHK.[/quote]I tend to agree, at least for the initial release. Extensions can be added later; so the important thing is to get it as close to "best" as we can on the first release.

[quote name="foom"]And with [Replace()] the g modifier is a must.[/quote][quote name="JSLover"]...by find-all do you mean the g regex flag?...yes it should be supported...somehow...[/quote]It will definitely be supported by Replace(), but perhaps not initially by Match (InStrRE).

[quote name="JSLover"]couldn't you support both?...g or the word global...?...i/I or the word case0/case1 for insensitive/sensitive.[/quote]That would boost readability and is a good thing to consider. Thanks.

[quote name="JSLover"]RegExMatch sounds good, preg_match for the perl among us (me)...or just match for the JavaScript in me...but InStrRE rubs me the wrong way...maybe InStrRegEx...tho...?[/quote]It's a constant battle between readability and brevity. For functions, I tend to prefer brevity because they're more often used by advanced users who prefer shorter names. We could have a quick poll to decide.

[quote name="Chris"]can you support ahk_regex in all string params?[/quote]For now it's just Match() and Replace(); but eventually in the windowing commands and perhaps in a split/loop-parsing capability.

[quote name="Titan"]Sorry if I missed something but what about backreferences? Will you output the traditional $1 .. $9/$n variables?[/quote]Yes, as foom confirmed, the substring that matches each subpattern (backreference) would be stored in an array element.

[quote name="John B."]One issue that will have to be addressed is what standard you use for escaped characters such as new-line and tabs. I was surprised to discover that AutoHotkey uses `n and `t instead of the familiar \n and \t. If you use the AutoHotkey escape sequence, it will be confusing to anyone who already knows regular expressions. If you use standard regular expressions escape sequence, it will be confusing to anyone who already knows AutoHotkey.[/quote]That's a good point. Assuming PCRE expects linefeeds, tabs, and other special characters to be sent in raw, I think we should stick with the AutoHotkey way of escaping because it will reduce code complexity and increase performance. This is because by then, AutoHotkey has already resolved `n to be a literal linefeed, `t to be a literal tab, etc.

[quote name="John B."]If I understand the Split issue correctly, would you be able to duplicate that functionality using a regexp substitution, by inserting a newline character as part of the replace expression?[/quote]Possibly. In any case, I'm pretty sure that Split's functionality can be easily achieved in most cases without a built-in Split function (though eventually there will probably be one).

[quote name="PhiLho"]RegExReplace(): My implementation can be used as prototype. ...I suggest you see my notes on the topic in my TestPCRE_DLL.ahk. There is also there some test code that can be reused.
...
Replace feature: facts and ideas[/quote]Great stuff! When the time comes, I might have a few questions for you about these.

[quote name="John B."]A quick check of the Regular Expression Pocket Reference from O'Reilly indicates the following:
The $n notation is used in Perl, Java,.NET, C#
The \n notation is used in UNIX, UNIX utilities, PHP, JavaScript[/quote]For simplicity, I'm leaning toward supporting $ only (no backslash) for backreferences. PhiLho gave a lot of great references about how other languages do this, which should help in choosing a good method.

[quote name="John B."][quote name="PhiLho"]... a powerful and cool feature in the replace function: we can provide a function as replace string.[/quote]I agree that this would be way cool, and could save a lot of time. Writing an RE to validate input may be good for the soul, but can be difficult to do in practice. It could also be a big help when transforming the input as PhiLho explained.[/quote]If this refers to PCRE's callout/callback feature, I agree it would be useful. Certainly not in the initial release, but perhaps down the road.

[quote name="majkinetor"]I think it is about time AHK have & operator for functions. This will allow us to use functions in reg exp replace, to use subclassing for advanced automatition and many other things that currently are not possible. This single update will open entire world of options.[/quote]You probably guessed that it would be non-trivial to implement. However, your proposal of using the address operator with an AHK function and have the callback automatically set up properly inside AHK is the ultimate in elegance and simplicity. Hopefully it will be feasible to implement someday.

[quote name="PhiLho"][CRLF] is important in our Windows universe...
More letters to add to the available options, unless Chris allows some +0x00008000 form (but this one is important enough to have its shortcut).[/quote]I assume you mean that there should be an option to switch between CRLF mode and LF mode. I'm assuming there's no easy way to auto-detect or auto-adapt, so the question becomes: should LF or CRLF be the default. AutoHotkey uses plain LF a lot in its internal strings, and encourages scripts to do the same. But Windows itself uses physical CRLF more often (such as in text files), and so does FileRead by default (for performance reasons).

Thanks for all the comments. More are welcome.

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

[Name of search function] It's a constant battle between readability and brevity.

Considering there will be a replace function, perhaps more (split?), I am leaning toward using RegEx as prefix (or something shorter?), like File, String, Gui, Group, Ini, Reg (possible confusion here?), etc. Perhaps RegExInStr, which has this prefix, and the familiar InStr name. Or REInStr, but some people don't like such run of capitals.

Assuming PCRE expects linefeeds, tabs, and other special characters to be sent in raw

No, it understands \r \n \t and some other escapes, so they can be used as is. By habit, lot of programmers write "[\\d\\w]+\\r\\n", for example. Now, if we feed it with raw chars, I suppose it is OK for it too... So both notations can be used without doing anything.

I'm pretty sure that Split's functionality can be easily achieved in most cases without a built-in Split function

Indeed, if we have a Loop ParseRegEx, there will be little need for a split. AHK just don't have enough power to manage arrays right now to make a split useful. And I have hard time finding use cases where split using RE is really useful, except perhaps parsing natural language (or source code...).

[Function as replace string] If this refers to PCRE's callout/callback feature

No, I don't think so, but I haven't looked closely at this one, so I can't tell for sure.
Let me show you some examples taken from my Lua macros for SciTE:
-- Replace words in all upper-case to words with only initial upper-case
			line, substNb = string.gsub(line, "([A-ZÀ-Ý])([A-ZÀ-Ý]+[ ,;-])",
					function (c, s)
						return c .. string.lower(s)
					end)

-- Except after a dot, replace a word starting with an initial upper-case by a lower-case word
	line, substNb = string.gsub(line, "([^.][ ;][A-Z])([^A-Z])",
			function (c, s) return string.lower(c) .. s end)

-- Find initial word of a line followed by a parenthesis
-- If it is a keyword, add a space before the parenthesis
	line, substNb = string.gsub(line, "^( +)(%l+)%(",
			function (s, kw)
				if kw == "if" or or kw == "for" or kw == "while" or
						kw == "switch" or kw == "synchronized" or kw == "catch" then
					return s .. kw .. " ("
				else
					return s .. kw .. "("	-- No changes, would be return nil in Lua 5.1
				end
			end)
The later example can be done in PCRE with alternations, but still it shows the power of the syntax.
Note that in these examples, I use "anonymous functions", that are created just for the replace operation, but I could have used regular function names as well. In Lua, functions are first-class objects, ie. they can be given as parameter, returned by a function, put in a table, etc.
I won't go that far with AutoHotkey, but yet allowing to use an existing function (a bit like OnMessage) would be powerful.

should LF or CRLF be the default

Aah, a good question. I would lean toward CRLF, because REs will be used a lot to process files. Data from GUI or plain AHK strings are more rarely multi-line: if RE is used to validate the content of an Edit field, most of the time it will be a default, single line Edit.

majkinetor
  • Moderators
  • 4512 posts
  • Last active: Oct 02 2013 02:33 PM
  • Joined: 24 May 2006
2 Chris

Can you explain what functions are internally in AHK and what is non-trivial part to get the address of acctual function ?

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
One issue I see is that, AFAIK, all data in AutoHotkey is more or less untyped and use the same format, ie. an expandable 64byte buffer containing strings...
Chris, correct me if I am wrong.
As I understand it, these strings are converted to numbers on the fly if needed in expressions, and converted back to string when assigning them.
#Include DllCallStruct.ahk

r := DumpDWORDs(var, GetSize(var))
var = 123
r := r DumpDWORDs(var, GetSize(var))
var += 123
r := r DumpDWORDs(var, GetSize(var))
rav = ABC
r := r DumpDWORDs(rav, GetSize(rav))
rav = abc%rav%123%var%
r := r DumpDWORDs(rav, GetSize(rav))
fvar = 3.1415926535897932384
r := r DumpDWORDs(fvar, GetSize(fvar))
fvar += 2.71
r := r DumpDWORDs(fvar, GetSize(fvar))
Clipboard := r

GetSize(ByRef @var)
{
	s := VarSetCapacity(@var, -1)
	If (s < 8)
		s := 8
	Return s
}

MsgBox % r
DllCallStruct.ahk BinaryEncodingDecoding.ahk
00 00 00 00 | 00 00 00 00 | 
31 32 33 00 | 00 00 00 00 | 
32 34 36 00 | 00 00 00 00 | 
41 42 43 00 | 00 00 00 00 | 
61 62 63 41 | 42 43 31 32 | 33 32 34 36 | 
33 2E 31 34 | 31 35 39 32 | 36 35 33 35 | 38 39 37 39 | 33 32 33 38 | 34 
35 2E 38 35 | 31 35 39 33 | 

So all data in AutoHotkey are string: booleans are 0, 1 in disguise, addresses are just integers, there is no true array, etc.
And there is no convenient way to tell "no data" (nil/null) or "this is a function reference".
Chris could add a byte to indicate the type, but it would make changes in the whole code.

Now, a reference to a function could be just its name, and we should find some syntax to use this reference, but this has to be carefully designed.

[EDIT] I show the result of my demonstation code...

Chris
  • Administrators
  • 10727 posts
  • Last active:
  • Joined: 02 Mar 2004

Can you explain what functions are internally in AHK and what is non-trivial part to get the address of acctual function?

Functions are basically just a pointer to the first line/command of the function's body, along with info about formal parameters. When a function is called, execution begins at its first line and continues until a "return" or the final '}' is encountered.

all data in AutoHotkey is more or less untyped and use the same format, ie. [a string]" ...strings are converted to numbers on the fly if needed in expressions, and converted back to string when assigning them.

Yes.

And there is no convenient way to tell "no data" (nil/null) or "this is a function reference".
Chris could add a byte to indicate the type, but it would make changes in the whole code.

I hadn't even thought that far yet: I was thinking only how to implement callbacks themselves. I understand that in the particular case of a RegEx callback, it's not the OS that's calling the script function but AHK itself. So in that particular case, you wouldn't actually need true callback support. However, other things (such as subclassing and any WinAPI function that expects a callback function), you would need to have an actual callback handler in the code, which involves considerable complexity. When called by the OS, the handler would call the corresponding function in the script.

majkinetor
  • Moderators
  • 4512 posts
  • Last active: Oct 02 2013 02:33 PM
  • Joined: 24 May 2006
Some questions:

1. Do you plan to change function architechture for version 2. This might be even able to integrate in current version with new syntax, like Sub or Function so it can be differentiated from current AHK functions.

2. What is the problem with already suggested approach to have several functions available for hooking (I think PhiLho suggested that).

3. Since functions are pointers to main AHK codeflow, pointers to AHK functions can be seen as double C pointers - pointer to C function containing code, and pointer to code within this function. Right ?

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
We are a bit off topic here, but the ctypes FFI library for Python seems quite complete and supports callbacks.
Will look deeper later.

Chris
  • Administrators
  • 10727 posts
  • Last active:
  • Joined: 02 Mar 2004
Thanks for the link PhiLho.

1. Do you plan to change function architechture for version 2. This might be even able to integrate in current version with new syntax, like Sub or Function so it can be differentiated from current AHK functions.

There is no plan to change it. If you think it should be changed, maybe you can describe the benefits in more detail (I'm a little confused about the intent).

2. What is the problem with already suggested approach to have several functions available for hooking (I think PhiLho suggested that).

There's no problem with that approach. In fact, I think it's the only way it can be done since a true callback must refer to a callable address in the program. However, although the concept is simple, the implementation is not because the incoming args sent from the caller have to be extracted from the call stack and passed on to a script function. Considerable code and expertise is likely to be involved.

3. Since functions are pointers to main AHK codeflow, pointers to AHK functions can be seen as double C pointers - pointer to C function containing code, and pointer to code within this function. Right ?

No because functions defined by a script aren't callable as "real" functions, nor can I see any way to make them so. I think there must be a go-between or "glue", which would a "real" function that receives the call and "forwards" it to the script function.

Thanks.

majkinetor
  • Moderators
  • 4512 posts
  • Last active: Oct 02 2013 02:33 PM
  • Joined: 24 May 2006
You didn't understand 3.

I now understand that AHK functions are pointers to code sections, but where that code resides ? In some single function like, AHK_Main(...) that contains entire script or no ? If so, the question is, does knowledge of this function and pointer to code section within it uniquely determines AHK function.