Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Tutorial: An AHK Introduction to RegEx


  • Please log in to reply
40 replies to this topic

Poll: Did you find this tutorial helpful? (45 member(s) have cast votes)

Did you find this tutorial helpful?

  1. Yes, I found it helpful. (48 votes [90.57%])

    Percentage of vote: 90.57%

  2. No, it wasn't helpful. (2 votes [3.77%])

    Percentage of vote: 3.77%

  3. Who taught you how to write, e.e. cummings? (3 votes [5.66%])

    Percentage of vote: 5.66%

Vote Guests cannot vote
sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
Inspired by Morpheus' question I have made an introductory tutorial for AHK'ers on how to use RegEx. I have three examples to start and plan on adding more examples to show how RegEx can be used in scripts. Any comments or questions are welcomed and appreciated.

Here is a quick link list to each individual post after this one, which is Lesson 1: A Brief Explanation of RegEx and Examples:

Lesson 2: Breaking down Example 2
Lesson 3: Breaking down Example 3
Lesson 4: Using RegEx with window titles
Lesson 5: RegEx and window titles, Ex. 2
Lesson 6: RegEx and window titles, Ex. 3
Lesson 7, Part 1: A window title and a URL - RegEx with window titles review
Lesson 7, Part 2: A window title and a URL - Introduction to using RegExMatch
Lesson 8, Part 1: Address parsing - More RegExMatch techniques and an Introduction to using RegExReplace
Lesson 8, Part 2: Address parsing - RegExMatch review
Lesson 9: HTML and RegEx - RegExReplace

I’m sure many users within the AHK community have heard about or have seen someone who has solved a problem using Regular Expressions, or RegEx, and wondered to themselves, “What the heck is that?!” A bunch of periods, pluses, backslashes, asterisks, question marks…wtf mate? This tutorial is aimed at making RegEx a little more understandable to the user who has no experience with it by looking at a few basic examples of RegEx statements and breaking them down.

What is RegEx?

Contrary to what you may think, the concepts behind RegEx probably aren’t that un-familiar to you. Have you ever done a search and used a wild card asterisk (*) before? Let’s take a look at a phrase we might use to search our PC for all of the text files on our hard drive:

*.txt
So what is that phrase telling us? It’s telling us to search for any file that ends in .txt; the wild card is, in effect, a short-circuit term meaning “any”. But let’s say we want to find various files that all have the same file name (“someprogram”) but with any extension. Now we use the wild card in a different way:

someprogram.*
Or, if we want to search for certain text files whose name begins with help but ends with different alphanumerical sequences follow by .txt, we could use the wild card to search for them like this:

help*.txt
Now all of these searches are, of course, looking for certain text in file names, but by using the wild card we can now better see that we are also searching for certain patterns in file names. The entirety of each of the above search terms, patterns and all, are expressions.

RegEx Explained

Regular expressions are similar in that they use “short-circuit terms” to create searchable terms, but the syntax is a little different. So let’s look back over our previous three search examples to see how a similar RegEx search would look. For our first example, the equivalent RegEx search term would be this:

.*\.txt
Now you might be saying to yourself “…uhhh-huh-uh, what?” but let’s refrain from reverting to Beavis and Butthead and take a moment to examine this statement (and keep the Regular Expressions Quick Reference from the manual handy!). In RegEx the period represents any single character that can be matched, so no matter what that one character is, the period will match it. That’s great if we’re matching only one character but clearly in this example we will likely need to match several characters; that’s where the asterisk comes into play.

(EDIT: Please note the addendum from Lexikos below regarding the period and "newline" characters.)

The asterisk will match zero or more of the preceding character, class or subpattern. We will discuss classes and subpatterns later, but for our example the action of the asterisk is dictated by the preceding character, which in this case is the period. In other words, .* will match zero or more occurrences of any character. If you haven’t guessed it already (or if you just read it in the manual) .* is one of the most permissive RegEx patterns since it will match, well, anything!

(Now there are two other match characters that act very similarly to the asterisk: the plus (+) and the question mark (?). Unlike the asterisk, the plus matches one or more of the preceding character, class or subpattern, so it has to match something in order to be valid whereas the asterisk can match nothing and still be valid. The question mark, on the other hand, matches zero or one of the preceding character, class or subpattern but it does so optionally, so if the RegEx statement colou?r doesn't find the word colour it will still match the word color. For our purposes we will continue using the asterisk in our examples but it is good to mention these additional match characters for future reference.)

So now that we’ve covered the .* let’s look at the rest of the search term. The backslash (\) is the escape character in RegEx, which means that the following single character will be literal. So \. means that we are looking for a literal period as opposed to "any single character". Since that is followed by the letters txt, which are already literal, we can see that \.txt represents the literal characters .txt. Put it all together and you’re doing a search for a file with any name (.*) that ends in .txt (\.txt), .*\.txt!

That wasn’t so bad, was it?

Now I do want to clarify one thing before we move on to the second example. If we had not used a backslash to signify the literal period (\.) RegEx still would've matched in virtually the same way since the period matches any character and the literal period is a character (duh). But there will be times when using the backslash to signify literals in your statements will be critical to matching terms, so we will use it here just to enforce good writing habits.

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
Now let’s take a look at the second example and see if you can guess what the equivalent RegEx search term is. If you said this:

someprogram\..*
You would be correct, but let’s add some specificity (yes, let’s make life more difficult). Let’s say you’re only looking for log files (.log) and ini files (.ini) with that name. That gives us two additional ways to limit our search: look only for letters after the literal period and look for a term that is only 3 characters in length after the literal period.

If we wish to look only for letters after the literal period we will now use brackets, or the character class terms. A class is any list and/or range of characters which can include lowercase and uppercase letters, numbers and underscores; in our case we want to match only letters so the lowercase and uppercase letter range classes ([a-zA-Z]) will apply.

Now to search for a term that is only three characters in length we move to using the braces, or min/max terms. Min/max, which is numeric, specifies that any matching term must have no less (or more) characters than the number(s) inside the braces, so {2,4}, for example, specifies that a matching term can be no less than 2 characters and no greater than 4 characters. {5}, by contrast, specifies that a matching term can be no less or more than 5 characters while {5,} specifies that a matching term must be no less than 5 characters but can be any number of characters greater than 5. For our purposes we want to match no less or more than three characters so {3} is the correct min/max term to use.

Now if we wanted to combine the character class and min/max terms we could make this RegEx search term:

someprogram\.[a-zA-Z]{3}
And it would give us valid results, but the character class has an additional feature which can help us narrow our potential matches even further. Although we used [a-zA-Z] as our range of letters, we don’t have to use the whole alphabet as our range. We can use any range in between two characters in the alphabet, kind of a min/max term for the character class. Since we’re searching for log files and ini files, we can create a range of characters that includes the letters contained in those extensions. The letters that may define our range are l-o-g-i-n (no pun intended), so with g being the first letter encountered alphabetically and o being the last, we can use this as our RegEx search term:

someprogram\.[g-oG-O]{3}
Or in other words, any file named “someprogram” with a three-letter file extension in which all three letters are between the letters g and o. Pretty neat, huh?

And now that we’ve learned about using character class and min/max, I’m going to show you the even easier and even more specific way that would’ve saved us all this time in the first place:

someprogram(\.log|\.ini)
The pipe character or vertical bar (|) indicates an alternative term that can be matched, so you can obtain a match with either the first term or any term(s) listed after a pipe(s). The parentheses are used to indicate that anything inside of them is a subpattern of the pattern; in other words, the matching pattern can be either someprogram\.log or someprogram\.ini.

Yes, it was a deception, but a well-intentioned one as I did it to introduce you to the different methods RegEx has available to you to solve a problem. You may now curse me under your breath as we move on to the last example.

garry
  • Spam Officer
  • 3219 posts
  • Last active: Sep 20 2018 02:47 PM
  • Joined: 19 Apr 2005
thank you sinkfaze, maybe move to Scripts&Functions (?)
should learn regex , didn't yet , you have good explanation, here just some links

http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm
<!-- m -->http://www.autohotke...pic.php?t=13545<!-- m -->
<!-- m -->http://www.regular-expressions.info/<!-- m -->
<!-- m -->http://gskinner.com/RegExr/<!-- m -->
<!-- m -->http://de.selfhtml.o...che/regexpr.htm<!-- m -->
<!-- m -->http://www.regenechs...pwcms/index.php<!-- m -->
<!-- m -->http://immike.net/bl... ... ould-know/<!-- m -->



sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
For the last example, again, we can use the good ol' .* if we want to go the cheap and easy route:

help.*\.txt
But in this case, not only will the above statement match a file named help0302.txt, it will also match a file named omgineedhelprlybad.txt. Why? Because the only thing the statement specifies is that the file name have help and any number of characters directly before .txt. Any files that have that pattern but don't necessarily start with help will also match, so even though we'll probably find what we need using that statement, it didn't limit our search in the way we intended either.

To solve this we can use anchors. Anchors designate a particular pattern so that it can only be matched if it is at the beginning or the end of a particular term. The circumflex (^) is the front anchor and when used is also placed in front of the specified pattern(^abc); likewise the dollar sign ($), the end anchor, is placed at the end of the specified pattern when used (abc$). In our case, we want to find only files that start with the word help, so we can use this:

^help.*\.txt
And that will limit the results in the way we need. But say that we want to find all text files that start with help, followed by an underscore, then a four-digit number and .txt, like help_0302.txt. Obviously we can use the numeric character class term [0-9] followed by a min/max term {4} to match the digits, but instead of using the numeric character class we can also use \d, which is the equivalent of the numeric character class and will match any single digit. Pair it up with the min/max term and we get this:

^help_\d{4}\.txt
But we can do one better than that. Since the underscore is a character class just like numbers are, we can also use \w, which will match any single "word" character, or, any character within any character class. So \w will match any lowercase letter, any uppercase letter, any number and any underscore, the equivalent of specifying [a-zA-Z0-9_] in your statement. With respect to that we can also use this:

^help\w{5}\.txt
If you've gotten this far, hopefully I haven't wasted precious minutes of your life and you can now look at a search statement using RegEx like this:

&#\d+;
And not be confused, but be able to recognize familiar match characters that have been reviewed through these examples. And although I plan to add to this tutorial, this is merely the beginning of what you can do with RegEx. For more detailed explanations, you can not only refer to the manual but also to the Autohotkey Wiki, which has a link to a far more detailed RegEx tutorial written by AHK member PhiLho and also a link to a RegEx "sandbox" where you can test your expressions against text examples to see how well they work.

Good luck and happy searching\.

BoBo³
  • Guests
  • Last active:
  • Joined: --
Yuu might think about to add a helpfull/crosslinking signature to the 'core-parts' of this thread! Something like this ...

Tutorial: An AHK Introduction to RegEx - [Part 1] - [Part 2] - [Part 3]

so your trainees will be able to follow more easilly the course once the thread gets flooded hundreds of replies. 8)

And yes, thanks for your effort :D

SoLong&Thx4AllTheFish
  • Members
  • 4999 posts
  • Last active:
  • Joined: 27 May 2007
Perhaps move it to the wiki so more people can add to it over time, but keep this thread so people will find it when they search?

<!-- m -->http://www.autohotke... ... :Tutorials<!-- m -->
see reg ex section here:
<!-- m -->http://www.autohotke... ... =Tutorials<!-- m -->

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
We’ve gone through some basic examples of RegEx search terms and how to create RegEx search statements, and again, I welcome any questions or suggestions with regards to anything I’ve tried to teach you about using RegEx, but after three posts have I taught you anything about how to use RegEx with AHK?

Nope, not really, but that’s about to change! Now instead of trying to generalize too much on ways that using RegEx can improve the way that you do things in your scripts, I’m just going to speak from my own personal experience and explain some of the ways in which using RegEx has improved my own scripts and use some of those ways as examples to continue learning about how to use RegEx. Anyone else who has something to add in terms of how using RegEx has improved their scripts is more than welcome to add it themselves or, if you’re not comfortable with your own writing skills, send me a private message with what you want to say and I will write a draft and send it back to you for your approval before posting (all appropriate credits will be given to the original contributor).

Now the single biggest way in which RegEx has improved my scripts has been with matching windows using SetTitleMatchMode, RegEx. Every day at my place of work I run two different sessions of the same application: one local and one remote session accessed through Citrix. Citrix client windows are pretty fickle to begin with, but properly distinguishing between windows was even more difficult given the permissiveness of SetTitleMatchMode, 2. I had resigned myself to the frustrating process of retrieving the unique ID of each window in the remote session and assigning them to ahk_groups every time I loaded or re-loaded my main script…but then I tried SetTitleMatchMode, RegEx.

Since I had always used the unique IDs to identify the program windows in the remote session, the window’s title never mattered to me, but now that I was using SetTitleMatchMode, RegEx my script was only working on certain windows in each application.

So for one program window, I had to find all possible window titles I would encounter to make the correct RegEx statement:

Customer Service Inquiry - [General Information << constantly changing >>] - \\Remote
Customer Service Inquiry - [<all> Messages for << constantly changing >> - \\Remote
Customer Service Inquiry - [Account Detail for << constantly changing >> - \\Remote
So what RegEx statement can I create that will match all of these? Well all three windows begin with this:

Customer Service Inquiry – [
So this would be a valid starting phrase:

^Customer Service Inquiry - \[
You’ll notice that I use the circumflex anchor (to match only if it’s at the beginning of the window title) and the backslash (to escape the literal open bracket). Since the information following the literal open bracket is never the same and does not always encounter a close bracket, we should look for the next literal character in all three titles that our statement can match, which would be the dash:

- \\Remote
Since we can match any character between the open bracket and the dash, we can use .* to make any match between those two characters as permissive as possible:

^Customer Service Inquiry - \[.*- \\Remote
Now you may be fooled into believing that we’re done here, but look again…did you find it? There are two literal backslashes preceding the word Remote! If we leave this statement as is RegEx will interpret the first backslash as an escape character and the second as a literal backslash, so RegEx will only find one literal backslash where we need it to find two. Our final working statement is this:

^Customer Service Inquiry - \[.*- \\\\Remote
I could then use that statement with GroupAdd to create a group that would always match only that window without the hassle of retrieving the unique ID all the time.

In the next post we’ll examine another way to use RegEx to match a series of window titles.

01072010 - Corrected the final code example to include the circumflex (thanks Redeemed07!)

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
Another program I use in my remote session changes the window title depending on if it is open but not in use:

Account Processing - [Entry] - \\Remote
Or if it is open and in use, in which case the window title will reflect the name of the “company” I’m impersonating:

MNIC -<< constantly changing >> - \\Remote
MIC of A -<< constantly changing >> - \\Remote
MIC of F -<< constantly changing >> - \\Remote
In this case all four possible window titles start with some text followed by a space and a dash, but what’s the best way to match it?

Since the amount of text there is to match at the beginning of the pattern is very small, I listed each possible match as an alternative inside of a subpattern:

(Account Processing|MNIC|MIC of A|MIC of F) -
The text after this point will differ as with our previous window, but in this case it will be between the two dashes:

-.*- \\\\Remote
And from there we get to the final RegEx statement:

(Account Processing|MNIC|MIC of A|MIC of F) -.*- \\\\Remote
The next example of window title matching using RegEx won’t be work-related but it may help your online poker game.

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
Okay I will admit: I play poker online, done it for years.  I can’t help it, something about the math and logic behind the odds of the card game draws me in.  I came to AHK looking for something that could help increase my work efficiency; only later did I find out that I could also use it to speed up my poker playing.
 
One of the most difficult things about getting AHK to work well with online poker games is the window titles; depending on what game you’re playing the window titles can be significantly different, like below:
 
Tournament 0381972 Table 54913282 - Holdem 6 seat - Stakes 20/40
Tournament 6851984 Table 06854960 - Omaha 9 seat - Stakes 100/200
Tournament 9510455 Table 32086215 - 7 Card Stud Hi/Lo - Stakes 50/100 Ante 20
 
So many different things are in flux during the game, thus the window titles are in flux.  And when you need to get the same buttons to work on windows with these different titles, it can truly be a pain.
 
Hellooooooo RegEx!
 
So for the above example, all three windows start with the same pattern:
 
Tournament 0381972 Table 54913282 -
Tournament 6851984 Table 06854960 -
Tournament 9510455 Table 32086215 - 
 
We can see that the first sets of numbers are all 7 digits and second sets are all 8 digits.  Since we can only assume that those numbers will only increase over time rather than decrease we could use min/max to capture the pattern:
 
Tournament \d{7,} Table \d{8,} - 
 
But seeing as a software update by the company could revert these numbers, and there will likely be no harm in doing it this way, we’ll use this pattern instead:
 
Tournament \d+ Table \d+ - 
 
\d+ will match one or more of any single digit, so it will be more than sufficient here.
 
Now for a rough patch:
 
Holdem 6 seat -
Omaha 9 seat -
7 Card Stud Hi/Lo – 
 
YUCK.  We have three different names of games and only two of them show how many people can be seated at the table?!  Retarded.
 
Despite this problem, there is still a pattern in that each window lists the name of the game being played, so let’s put them into a subpattern with alternatives:
 
(Holdem |Omaha |7 Card Stud Hi/Lo )
 
Not bad, but now what do we do about the number of seats, which are optional?  Now we can obviously group them into a subpattern with alternatives:
 
(6 seat |9 seat )
 
Okay…and?  Look at my first sentence about this problem again:
 

…what do we do about the number of seats, which are optional?

 
If we remember from a previous lesson the character we use to optionally match a character, class or subpattern is question mark:
 
(6 seat |9 seat )?
 
BAM!

So here's the statement that we have so far:
 
Tournament \d+ Table \d+ - (Holdem |Omaha |7 Card Stud Hi/Lo )(6 seat |9 seat )?- 
Now we’re cookin’!

The next part of the statement is with the stakes, and as with the other digits in our statement, they will be ever-increasing. Let's not re-invent the wheel:

Tournament \d+ Table \d+ - (Holdem |Omaha |7 Card Stud Hi/Lo )(6 seat |9 seat )?- Stakes \d+/\d+
The last part is the optional section with the ante, which is ever-increasing and...well, you get the idea:

Tournament \d+ Table \d+ - (Holdem |Omaha |7 Card Stud Hi/Lo )(6 seat |9 seat )?- Stakes \d+/\d+( Ante \d+)?
And that's it! You're now creating those long, funky-looking RegEx statements you were always afraid of when somebody else was doing it.

More examples to come...

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
Someone asked me a question about my last example which is very valid. It was brought to my attention that as part of my subpatterns of alternatives I was including spaces, whereas in my previous examples I did not:

(Holdem |Omaha |7 Card Stud Hi/Lo )
(6 seat |9 seat )?
( Ante \d+)?
It had not occurred to me that this would require explanation but after it was pointed out to me I realize that it would be a good idea.

The reason that I included spaces in the subpatterns of alternatives is because it will be the easiest way to keep track of where the spaces are supposed to go.

For example, no matter what the game type name is it will have exactly one space between itself and whatever comes next, whether it's the optional seat subpattern or the dash:

7 Card Stud Hi/Lo{SPACE}-
Omaha{SPACE}9 seat
And if the optional seat subpattern is satisfied, it, too, will have exactly one space between itself and the dash:

6 seat{SPACE}-
Now in the case of the game type name, we could have omitted the spaces from each alternative in the subpattern and specified one single space outside of the subpattern, like this:

(Holdem|Omaha|7 Card Stud Hi/Lo) (6 seat |9 seat )?-
But that also makes the statement less readable, since a quick glance may not tell you that there is a space between the game type name subpattern and the optional seat subpattern. Putting a space with each alternative in the subpatterns takes a few extra characters, but it also leaves no question as to where and when the spaces should be there.

In the case of the final optional subpattern:

( Ante \d+)?
I placed a space before the word ante inside the subpattern because the space is optional just like the rest of the subpattern. The space could've been removed from the subpattern and made to precede it, but it wouldn't necessarily apply for all matches even if it doesn't prevent matches from occurring. For purposes of good statement writing I try to keep all optional characters optional if I can.

Hopefully this explanation give a little better insight into why the spaces are included in the subpatterns for this example.

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
Okay, one last example of using RegEx to match a window title.

At my place of work we all have an intranet application we work with in Internet Explorer which will require a minimum of two unique sessions open. Although the programs in each session are unique to their respective window, they both access the same data the same way to do it, thus their window titles are consistently very similar.

I have to be able to tell the windows apart so each window can have unique hotkeys but the only way I can do it is through their unique IDs. The window titles don't have enough unique information to do this, but the URLs in the address bar for each window do (and the address bars are hidden since the application runs in kiosk mode).

So the plan of action to make this whole thing happen is to first run WinGet, , List to search for only those two app windows and save their unique IDs into an array. Once that's done I'll get the URL from the first window element in the array, extract a certain piece of data from the URL to tell which window it is and assign each window to its own group from there.

Now I need to write the best statement that will match both windows and here are the three types of formats the window title can be found in:

<< some long digit number >> Corporate Intranet Application
Corporate Intranet :: Application
Corporate Intranet Application
Now the title can optionally start with the long digit number, so we'll start there. Rather than assign a specific range of digits I'll just match all digits since that's the only instance in which they should show up in the title:

(\d+ )?
Again notice that I use a space after \d+ since the space, like the digits, are optional. Now we enter in the first two words in the title since they will always match in that position:

(\d+ )?Corporate Intranet
And the :: is optional between the second and third word in the title:

(\d+ )?Corporate Intranet( ::)? Application
And part one of the plan is done, this command is ready to go:

WinGet, WinVar, List, (\d+ )?Corporate Intranet( ::)? Application
Part two of the plan is a little trickier than it seems, but luckily a good RegEx statement in combination with RegExMatch will help us. That lesson and a few more on ways you can use RegExMatch are next.

Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006

In RegEx the period represents any single character that can be matched, so no matter what that one character is, the period will match it.

Correction: it does not match a newline unless the s option is used. (The definition of "newline" depends on the presence or absence of the `n, `r or `a options; by default . will not match `r`n.)

The question mark, on the other hand, matches zero or more of the preceding character, class or subpattern just like the asterisk but it does so optionally,

Correction: it matches exactly zero or one of the preceding item. It does not match "just like the asterix".

The backslash (\) is the escape character in RegEx, which means that the following single character will be literal.

Correction: characters which otherwise have special meaning will be literal when preceded by a backslash. There are several characters which have special meaning only when preceded by a backslash - such as \d, in your demonstrations.

Other than those few errors, it looks like it will be an excellent resource for beginners. :)

You might also like to mention the other use of a question mark - as a modifier for + or *. Look for "Greed" in the quick reference if you're not familiar with it.

Morpheus
  • Members
  • 475 posts
  • Last active: Oct 21 2014 11:08 AM
  • Joined: 31 Jul 2008
I did find this helpful, thanks for taking the time to write this tutorial.

I was using the online RegEx tester: <!-- m -->http://gskinner.com/RegExr/<!-- m -->
that I found in the Wiki. The way the tester works is that I have to enter the Regex, and the tester highlights the captured text. What I am wondering is: Would be possible for someone to do the opposite, and create the regex, based on text entered. In your tutorial, you had three pieces of text that you were trying to match. What if the user could enter text in three boxes, and the regex was created for them?
This is probably more difficult than I realize...
Thanks again for the tutorial, I will bookmark it for reference.

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008

Correction, correction, correction...


Ah, you are quite correct. That's what I get for a moment of inspiration, an insatiable urge to write and a complete lack of a proofreader.

...[period] does not match a newline unless the s option is used. (The definition of "newline" depends on the presence or absence of the `n, `r or `a options; by default . will not match `r`n.)


I had actually caught this omission early on but wasn't quite sure how to approach it since none of my examples to this point have dealt with "newline" and I didn't want to stray too far from topic (if you couldn't tell, I'm wordy :O ). I'm sure I'll be able to approach it in later examples using RegExMatch and RegExReplace but in the meantime your correction will serve as fair warning.

...[questionmark] matches exactly zero or one of the preceding item. It does not match "just like the asterix".


This correction has been made.

...characters which otherwise have special meaning will be literal when preceded by a backslash. There are several characters which have special meaning only when preceded by a backslash - such as \d, in your demonstrations.


Might be helpful if we knew which characters required the backslash escape character, huh? :roll: It never ceases to amaze me how the obvious things get missed when I get on a roll writing.

For anyone reading, here's the snippet directly from the manual on escaped characters:

ressions - Fundamentals[/url]":21lgxccp">

Escaped characters: Most characters like abc123 can be used literally inside a regular expression. However, the characters \.*?+[{|()^$ must be preceded by a backslash to be seen as literal. For example, \. is a literal period and \\ is a literal backslash. Escaping can be avoided by using \Q...\E. For example: \QLiteral Text\E.


You might also like to mention the other use of a question mark - as a modifier for + or *. Look for "Greed" in the quick reference if you're not familiar with it.


I actually thought about this one a lot as I worked through the original examples and the window title examples, but I was again having a hard time deciding where I could insert it into an example without getting too far from the point. And since both the Quick Reference and the original post that inspired the tutorial both dealt with greed in terms of HTML, I decided procrastinating until I got to some RegExMatch and RegExReplace examples (which will have HTML) was the best course of action. So you weren't the only one who noticed something was sorely lacking, I was just too lazy to do anything about it. :wink:

Other than those few errors, it looks like it will be an excellent resource for beginners. :)


Why thank you, and thanks for pointing out the errors!

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008

Would it be possible for someone to do the opposite, and create the regex, based on text entered.


What are you trying to do, put me out of my voluntary job? :p I'm sure there are programs out there that can do this but I haven't tried to find any of them. One problem I can see would be syntax. The AHK manual on regular expressions mentions only the basic, most commonly-used RegEx syntax; I don't know for a fact but I would imagine that such a RegEx creating program would use a wider range of syntax to be as accommodating with matches as possible. While AFAIK AHK would support such syntax, there may be quite the learning curve to catch up to what the program is doing.

I think the biggest problem with such a program would be the ability to see all options available to you. As you saw in that initial thread you replied to, jaco and I solved the exact same problem two different ways. Even many of my above example problems could've been solved in several different ways. I'm not a programmer but I would assume you would have to have some real programming mettle to come up with a RegEx creating program that could display several different ways to match a string or a series of strings.

In all liklihood I would expect the resulting RegEx statements from such a program would reflect the author's preferences in how they use RegEx, which isn't necessarily bad if it solves the problem, but using the program may also close your thinking off to other ways to approach the problem. In programming (or quasi-programming, anyway) how we solve a problem can be just as important as the result, and if you use a RegEx creator that gets you the result but short-circuits your understanding of how it got that result, it can put you at a disadvantage in the long-term.