jeeswg's RegEx tutorial (RegExMatch, RegExReplace)

Helpful script writing tricks and HowTo's
User avatar
andymbody
Posts: 867
Joined: 02 Jul 2017, 23:47

Re: jeeswg's RegEx tutorial (RegExMatch, RegExReplace)

Post by andymbody » 01 Jan 2022, 13:12

This is very helpful!! Thank you so much for creating it!

:?: correct me if I'm wrong... but your post shows
;delete everything after the first n
vText := "abcdefghijklmnopqrstuvwxyz"
MsgBox, % RegExReplace(vText, "^.*(?=n)")
As far as I can tell... this would...
delete everything BEFORE (but not including) LAST n
keep everything from LAST n onward (including n)

Andy
Last edited by andymbody on 01 Jan 2022, 14:43, edited 3 times in total.

User avatar
andymbody
Posts: 867
Joined: 02 Jul 2017, 23:47

Re: jeeswg's RegEx tutorial (RegExMatch, RegExReplace)

Post by andymbody » 01 Jan 2022, 14:26

bouncing between m and n is a bit confusing IMO...

instead of this...

;==============================

;delete text relative to needle

;delete everything after the first m
vText := "abcdefghijklmnopqrstuvwxyz"
MsgBox, % RegExReplace(vText, "^.*?m\K.*")

;delete from the first n onwards
vText := "abcdefghijklmnopqrstuvwxyz"
MsgBox, % RegExReplace(vText, "^.*?\Kn.*")

;delete everything up to the first m
vText := "abcdefghijklmnopqrstuvwxyz"
MsgBox, % RegExReplace(vText, "^.*?m")

;delete everything after the first n
vText := "abcdefghijklmnopqrstuvwxyz"
MsgBox, % RegExReplace(vText, "^.*(?=n)")

;==============================

might i suggest this...
(also corrected the one I mentioned in my last post)

Code: Select all

;==============================

; delete/keep text relative to needle

	; delete everything AFTER (and including) N
	; keep everything BEFORE (but not including) N
	vText := "aNcdeNfghijklmNopqrNstuvwxyz"
	MsgBox, % RegExReplace(vText, "^.*\KN.*")	; match LAST N		(aNcdeNfghijklmNopqr)
	MsgBox, % RegExReplace(vText, "^.*?\KN.*")	; match FIRST N		(a)
	
	; delete everything AFTER (but not including) N
	; keep everything BEFORE (and including) N
	vText := "aNcdeNfghijklmNopqrNstuvwxyz"
	MsgBox, % RegExReplace(vText, "^.*N\K.*")	; match LAST N		(aNcdeNfghijklmNopqrN)
	MsgBox, % RegExReplace(vText, "^.*?N\K.*")	; match FIRST N		(aN)

	; delete everything BEFORE (and including} N
	; keep everything AFTER (but not including) N
	vText := "aNcdeNfghijklmNopqrNstuvwxyz"
	MsgBox, % RegExReplace(vText, "^.*N")		; match LAST N		(stuvwxyz)
	MsgBox, % RegExReplace(vText, "^.*?N")		; match FIRST N		(cdeNfghijklmNopqrNstuvwxyz)

	; delete everything BEFORE (but not including) N
	; keep everything AFTER (and including) N
	vText := "aNcdeNfghijklmNopqrNstuvwxyz"
	MsgBox, % RegExReplace(vText, "^.*(?=N)")	; match LAST N		(Nstuvwxyz)
	MsgBox, % RegExReplace(vText, "^.*?(?=N)")	; match FIRST N		(NcdeNfghijklmNopqrNstuvwxyz)


;==============================
/*
Notes:
 
	Keep in mind that RegExReplace() and RegExMatch() are GREEDY by default (when using ".*")
	Meaning... when using ".*N", they will report the LAST occurrence of N, not the first...
	Which makes sense when thinking about it, but is not the way I envision the results when I'm creating needles.
	I'm constantly troubleshooting myself, because I'm constantly forgetting this fact.
	Which is why I am including this note as a reminder to others who may be struggling with the same thing.
	
	To get the first occurrence (when using .*) we use ? as a STOP sign. Stop and report at FIRST occurrence!
	In the examples above, you will notice that the only difference between the FIRST and LAST needles was the
	addition of the ? when making the request.
	
	.*N		LAST occurrence
	.*?N	FIRST occurrence
	
	-----------------------------------------------------------------------------------------------------------------------------------------
	!! NOW... About the ? char... why does it have SO MANY different functions? - IDK!! - makes things very confusing!!		
	-----------------------------------------------------------------------------------------------------------------------------------------

		Multiple uses for ?
		
			Function 1.		Usually relates to 'greedy'/'ungreedy' (example above - "STOP" sign)
				".*?N"		stop search at FIRST N
			
			Function 2.		0 or 1 of preceding char
				".*N?"		0 or 1 N char
			
			Function 3.		Positive and negative Look Ahead and Look Behind
				".*(?<=m)N.*"		match occurs if N is IMMEDIATELY PRECEDED by m anywhere in haystack (returns entire needle match)
				".*(?<!m)N.*"		match occurs if N is NOT IMMEDIATELY PRECEDED by m anywhere in haystack (returns entire needle match)
				".*N(?=m).*"		match occurs if N is IMMEDIATELY FOLLOWED by m anywhere in haystack (returns entire needle match)
				".*N(?!m).*"		match occurs if N is NOT IMMEDIATELY FOLLOWED by m anywhere in haystack (returns entire needle match)
				".*(?<=m)N(?=m).*"	match occurs if N is IMMEDIATELY PRECEDED and IMMEDIATELY FOLLOWED by m anywhere in haystack (returns entire needle match)
				".*(?<!m)N(?!m).*"	match occurs if N is NOT IMMEDIATELY PRECEDED or IMMEDIATELY FOLLOWED by m anywhere in haystack (returns entire needle match)
							
			Function 4.		To negate normal use of parentheses
				(?:.*)		To use the parentheses without the side-effect of capturing a subpattern


			There may be more uses that I left out - I'm always learning more about RegEx
			 !! Why aren't other ascii chars used for these other functions? !!
			 VERY CONFUSING indeed!
			 
	-----------------------------------------------------------------------------------------------------------------------------------------


	\K is very handy!
	Meaning - pretend this is the beginning of the match
	(When reporting the match, do not include anything to the left of it - only include everything to the right of it)
	
*/

[Mod edit: [code][/code] tags added.]

Hope this helps someone else!
Andy

User avatar
andymbody
Posts: 867
Joined: 02 Jul 2017, 23:47

Re: jeeswg's RegEx tutorial (RegExMatch, RegExReplace)

Post by andymbody » 26 Dec 2023, 12:15

I've learned a lot about Regex since my last post here two years ago.

I came up with this and thought I would contribute/share. It's not perfect, but works well for most cases.

Flexible File Path pattern

As far as I can tell, it follows normal Windows path syntax. I went to great lengths to prevent things like spaces immediately before or after backslash, dots before backslash, trailing dots/spaces, etc. This is the reason the needle is longer than some you will see.

It will not find simple "filename.txt" (intentionally), because that would produce too many false-positives. You can add this if you like by tacking the pattern to the end (another OR pattern).

Hope others find it useful (test and respond if you find an issue with it)

Code: Select all

v1Needle := "(\b[A-z]:)((?:\\(?:\.+ *)*(?:(([^\\\s./:*?""<>|\x00-\x1F]+)(?:[. ]+(?-1))*)+))+(?<![. ])\\?)|(?-2)?(?-3)|(?-4)\\?"
RegExMatch(haystack, v1Needle)
/*
It will obviously find patterns that are similar to file path patterns, so keep this in mind.
Examples of what it should support
	All of the following with/without dots and spaces (in the proper places)
	Supports patterns found anywhere on a line, not just at beginning of line
	d:\							(with or without trailing \)
	d:\dir\dir\filename.ext		(most needles only support this pattern, and allow illegal chars)
	\dir\dir\					(with or without prefix \, or trailing \, or filename.ext)
*/

Post Reply

Return to “Tutorials (v1)”