To give context about this project, in case anyone is interested...
The project is actually a wrapper class for Regex itself. The plan is to enhance the current capabilities of the native AHK RegexMatch and RegexReplace. For instance... one of the enhancements will be to allow "exclusions" to be applied to the haystack prior to the normal regex search. The (unlimited number of) exclusions will be secondary needles which will allow certain parts of the primary haystack to be ignored while conduction the primary needle search.
For instance, if I want to remove all line comments within a .ahk file, a simplified regex might look something like this
Code: Select all
myStr := "this is my string" ; this is my comment
lineInAhkFile := ; haystack
(
"myStr := ""this is my string"" ; this is my comment"
)
msgbox % RegexReplace(lineInAhkFile, ";.*$") ; acts as expected
BUT... what if the haystack includes the needle itself (which was the situation that led to this thread)
Code: Select all
needle := ";.*$"
lineInAhkFile := ; haystack
(
"needle := "";.*$"""
)
msgbox % RegexReplace(lineInAhkFile, needle) ; removes part of the regex needle
In this general example, the contents of a .ahk file are unknown, so general static-needles may not accommodate every possible scenario.
Traditionally we would need to know that a semi-colon may be found within a critical code-string (like a regex needle), and we would need to design our primary needle for this possibility. Which may involve look-arounds, and trying to think of every possible immediate character that may come before or after the semicolon, checking to be sure the semicolon doesn't fall somewhere within the boundaries of a string, etc. It would involve a ridiculous amount of work for minimal (if any) payback. And the needle would be much more complex than is necessary in normal circumstances.
Instead, what we really need (built in to rexex itself) is a way to ignore certain situations in the haystack, like semicolons being part of a string. We need to remove the string from the haystack prior to conducting our search/replace, then put the string back where we found it prior to returning the primary regex result. This simplifies the process in my opinion, since we don't need to accommodate every possible scenario that could break our needle. True, the primary (simplified) needle used within the enhanced Regex may not perform the same outside of AHK, but who cares? In my opinion, the general Regex design standard is due for an upgrade that includes enhancements like this anyway.
So, my plan to to create the enhanced version (with many other enhancements as well). Hopefully I can post it for others to try once it is done. If it is rejected by the masses, that's ok too.
BTW... I anticipate some readers may want to offer their solution to the 'comment' example above that does not involve a Regex enhancement. This is not necessary... I know how to solve this particular situation with a more appropriately designed needle. The plan is to design a class that can support more than just this one example... and should do so as well as (or better than) the native Regex. And no, I do not plan to accommodate every possible scenario within the class itself. This will be accomplished by the caller passing secondary needles to identify what should be ignored. This thread was not intended to dig into the details of the Regex enhancement project itself, which is why I left out the details in the beginning. But I now feel it necessary to at least provide the context for asking my initial question.
Anyway, thanks for all the suggestions and help with resolving the
\E obstacle. I really appreciate you all!
Andy