v2 RegExReplace Callout Bug Topic is solved

Report problems with documented functionality
User avatar
thqby
Posts: 398
Joined: 16 Apr 2021, 11:18
Contact:

v2 RegExReplace Callout Bug

Post by thqby » 31 Jul 2021, 04:56

Code: Select all

RegExReplaceEx(Haystack, NeedleRegEx, CallBack, Limit := -1, StartingPosition := 1) {
	local LastFoundPos := 1, Ret := ''
	RegExReplace(Haystack, NeedleRegEx '(?CCallout)', '', , Limit, StartingPosition)
	return Ret SubStr(Haystack, LastFoundPos)
	Callout(Match, CalloutNumber, FoundPos, *) => (Ret .= SubStr(Haystack, LastFoundPos, FoundPos - LastFoundPos) CallBack(Match), LastFoundPos := FoundPos + StrLen(Match[0]), 0)
}

MsgBox(RegExReplaceEx('543*564=','(\d+)\*(\d+)=', m => m[1] '*' m[2] '=' (m[1]*m[2])))  ; 543*564=306252
MsgBox(RegExReplaceEx('543*564=','(\d+)\*(\d+)=', m => m[1] '*' m[2] '=' (m[1]*m[2])))  ; Invalid memory read/write.
@lexikos

User avatar
thqby
Posts: 398
Joined: 16 Apr 2021, 11:18
Contact:

Re: v2 RegExReplace Callout Bug

Post by thqby » 31 Jul 2021, 07:26

I found that the Closure object is transmitted when the regular compile, but the object is released after the call is completed.

Code: Select all

void *pcret_resolve_user_callout(LPCTSTR aCalloutParam, int aCalloutParamLength)
{
	// If no Func is found, pcre will handle the case where aCalloutParam is a pure integer.
	// In that case, the callout param becomes an integer between 0 and 255. No valid pointer
	// could be in this range, but we must take care to check (ptr>255) rather than (ptr!=NULL).
	auto callout_var = g_script->FindVar(aCalloutParam, aCalloutParamLength);
	return callout_var ? callout_var->ToObject() : nullptr;
}

Code: Select all

UserFunc::Call(...) {
	...
free_and_return:
	Var::FreeAndRestoreFunctionVars(*this, recurse.backup, recurse.backup_count);
	...
}

lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: v2 RegExReplace Callout Bug

Post by lexikos » 31 Jul 2021, 23:05

AutoHotkey caches the last 100 regex patterns, so callouts and function objects are inherently unsafe to use in this manner at present. Even if the reference counting is corrected to prevent the crash, the callout would not be updated unless the pattern happens to have been pushed out of the cache (by compiling 100 other patterns). This would mean that generally only the first call to your function would work, because every other call would refer to the wrong callout. You can verify this by adding ObjPtrAddRef(Callout).

Correcting the reference counting would mean not only calling AddRef, but also releasing the callout when the compiled pattern is deleted. The callout is currently baked directly into the compiled bytecode, so this wouldn't be trivial.

Another issue is that cached patterns are not scoped to the function, so if there are multiple functions in the script named X, a callout such as (?CX) may call the wrong function, depending on which function caused the pattern to be cached. Since we work with variables now and not purely functions, there can be similar issues when the same pattern is used repeatedly within the one function, after assigning different values to the variable X.

The simplest solution is probably to resolve callouts each time they are called. This would reduce performance, but I think callouts are rarely the most efficient solution to a problem, unless they are the only solution.

As for iterating over matches, using RegExMatch in a loop is both cleaner and more efficient than using RegExReplace with a callout.

Code: Select all

RegExReplaceF(Haystack, NeedleRegEx, CallBack, Limit := -1, StartingPosition := 1) {
    Ret := ''
    while RegExMatch(Haystack, NeedleRegEx, &Match, StartPos := IsSet(Match) ? Match.Pos + Match.Len : 1)
        Ret .= SubStr(Haystack, StartPos, Match.Pos - StartPos) CallBack(Match)
    return Ret SubStr(HayStack, StartPos)
}
I tested against two implementations based on yours:

Code: Select all

; Use static variables to prevent the nested function from becoming a closure.
RegExReplaceEx(Haystack, NeedleRegEx, CallBack, Limit := -1, StartingPosition := 1) {
	static LastFoundPos, Ret, sCallBack
    LastFoundPos := 1, Ret := '', sCallBack := CallBack
	RegExReplace(Haystack, NeedleRegEx '(?CCallout)', '', , Limit, StartingPosition)
	return Ret SubStr(Haystack, LastFoundPos)
	Callout(Match, CalloutNumber, FoundPos, Haystack, *) {
        Ret .= SubStr(Haystack, LastFoundPos, FoundPos - LastFoundPos) sCallBack(Match)
        LastFoundPos := FoundPos + Match.Len
    }
}

Code: Select all

; Use pcre_callout instead of an explicitly named callout.
RegExReplaceEx(Haystack, NeedleRegEx, CallBack, Limit := -1, StartingPosition := 1) {
	local LastFoundPos := 1, Ret := ''
	RegExReplace(Haystack, NeedleRegEx '(?C)', '', , Limit, StartingPosition)
	return Ret SubStr(Haystack, LastFoundPos)
	pcre_callout(Match, CalloutNumber, FoundPos, *) => (Ret .= SubStr(Haystack, LastFoundPos, FoundPos - LastFoundPos) CallBack(Match), LastFoundPos := FoundPos + StrLen(Match[0]), 0)
}

User avatar
thqby
Posts: 398
Joined: 16 Apr 2021, 11:18
Contact:

Re: v2 RegExReplace Callout Bug

Post by thqby » 01 Aug 2021, 02:53

I also have a version similar to RegExReplaceF, it's actually more efficient than using a regular with a callout function.

If the regular expressions used in the same scope, the solution I've tried is to change the return value of pcret_resolve_user_callout from IObject* to Var*, and resolve callouts each time when regular are called, there is little performance impact.

And another way I can change RegExReplace's third argument is to support callbacks, just like a regular substitution with callback in JavaScript.

This is my code. https://github.com/thqby/AutoHotkey_H/blob/alpha/source/script2.cpp#L14152

lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: v2 RegExReplace Callout Bug

Post by lexikos » 01 Aug 2021, 04:29

the solution I've tried is to change the return value of pcret_resolve_user_callout from IObject* to Var*
That is an elegant solution, although it still won't work correctly if one pattern is used in multiple functions. I might implement it anyway, as it sound very simple and I'm lacking motivation to fix this particular issue properly. Thanks.

lexikos
Posts: 9560
Joined: 30 Sep 2013, 04:07
Contact:

Re: v2 RegExReplace Callout Bug  Topic is solved

Post by lexikos » 09 Dec 2022, 20:24

Fixed by v2.0-rc.2.

Post Reply

Return to “Bug Reports”