Regex finding whitespaces Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Bachelar
Posts: 28
Joined: 26 Aug 2020, 12:30

Regex finding whitespaces

05 Aug 2021, 02:34

Hi, y'all!

What I'm searching for is help in RegEx. I've got some string constisting of text and I want to find all whitespaces that are around 1-, 2- and 3-letter words. After a couple of days of trying I know how to find all spaces, how to find words I need and how to find spaces and said words. By now I just wonder if what I want is even possible.

Any advice is most welcomed.
AHKStudent
Posts: 1472
Joined: 05 May 2018, 12:23

Re: Regex finding whitespaces

05 Aug 2021, 02:47

post a string example and your desired output, there are regex pros here that are creative and helpful.
Bachelar
Posts: 28
Joined: 26 Aug 2020, 12:30

Re: Regex finding whitespaces

05 Aug 2021, 04:29

Well, as for the example string it could be like this: cat dog mouse apple tree I off. It's not really specified. And what I would want to have as output are spaces around cat dog I off, i.e. spaces around 1-, 2- or 3-letter words.
User avatar
boiler
Posts: 17051
Joined: 21 Dec 2014, 02:44

Re: Regex finding whitespaces  Topic is solved

05 Aug 2021, 05:22

A potential problem with keeping all the spaces found around those words in the output is that you would often have double spaces between those words and spaces at the beginning and/or end of the string. Do you really just want all those words in a list separated by one space? This script produces that output:

Code: Select all

Str := "cat dog mouse apple tree I off"
Output := Trim(RegExReplace(RegExReplace(Str, "\w{4,}"), "\s{2,}", " "))
MsgBox, % Output
Bachelar
Posts: 28
Joined: 26 Aug 2020, 12:30

Re: Regex finding whitespaces

06 Aug 2021, 02:35

boiler wrote:
05 Aug 2021, 05:22
A potential problem with keeping all the spaces found around those words in the output is that you would often have double spaces between those words and spaces at the beginning and/or end of the string. Do you really just want all those words in a list separated by one space? This script produces that output:

Code: Select all

Str := "cat dog mouse apple tree I off"
Output := Trim(RegExReplace(RegExReplace(Str, "\w{4,}"), "\s{2,}", " "))
MsgBox, % Output
Yep, that's exactly what I want. It doesn't really matter if words in a list are separated by one, two or more spaces. I just needed a way to get that list of certain words and I think I can cope with next steps.
Thank you so much.
User avatar
boiler
Posts: 17051
Joined: 21 Dec 2014, 02:44

Re: Regex finding whitespaces

06 Aug 2021, 06:24

I see. If the number of spaces around the words doesn’t matter, it can be reduced to this, which just does the main action of removing all words that are 4 characters or longer:

Code: Select all

Output := RegExReplace(Str, "\w{4,}")
Bachelar
Posts: 28
Joined: 26 Aug 2020, 12:30

Re: Regex finding whitespaces

06 Aug 2021, 06:45

boiler wrote:
06 Aug 2021, 06:24
I see. If the number of spaces around the words doesn’t matter, it can be reduced to this, which just does the main action of removing all words that are 4 characters or longer:

Code: Select all

Output := RegExReplace(Str, "\w{4,}")
Yeah, I thought about that and tried that. Now onto the next steps. What I needed this RegEx for is to replace whitespace(-s) between 1-, 2- or 3-letter words with non-breaking space, so I could format very long texts (i.e. so there would be no 1-, 2- or 3-letter words at the end of the line). I am aware of the MS Word's find & replace tool and its syntax and I was wondering whether it could be done directly with AHK RegEx (or COM Object? :think: )

Word syntax is as follows: in the find box (<[a-z]{1;3}>) (there's compulsory trailing space) and \1^s in replace with.
User avatar
boiler
Posts: 17051
Joined: 21 Dec 2014, 02:44

Re: Regex finding whitespaces

06 Aug 2021, 07:41

Yes, it can be done directly in the AHK statement. In the code below, the Trim first removes any spaces at the very beginning or end of the string, which would occur if the list started or ended with longer words, then it replaces all groups of spaces (one space or more) with a non-breaking space character.

Code: Select all

Str := "cat dog mouse apple tree I off"
Output := RegExReplace(Trim(RegExReplace(Str, "\w{4,}")), "\s+", Chr(160))
MsgBox, % Output
Of course, you can't tell from the MsgBox, but if you copy its contents and paste into an editor where you can study the characters, you'll see the words are all separated by a single non-breaking space.
Bachelar
Posts: 28
Joined: 26 Aug 2020, 12:30

Re: Regex finding whitespaces

06 Aug 2021, 09:54

boiler wrote:
06 Aug 2021, 07:41

Code: Select all

Str := "cat dog mouse apple tree I off"
Output := RegExReplace(Trim(RegExReplace(Str, "\w{4,}")), "\s+", Chr(160))
MsgBox, % Output
Yeah, it's nice, but still not quite what I would want. The output consists of words and spaces and I want only spaces around 1-, 2- or 3- letter words, so I could substitute them easily. If that's even possible. That or I don't really grasp your tips.
User avatar
boiler
Posts: 17051
Joined: 21 Dec 2014, 02:44

Re: Regex finding whitespaces

06 Aug 2021, 14:15

OK. I think I understand what you want. In your example of cat dog mouse apple tree I off, when you said you wanted an output of spaces around cat dog I off, you didn't mean that was the actual output. You meant an output of cat_dog_mouse apple tree_I_off, where the _ represents a non-breaking space. So this should do it:

Code: Select all

Str := "cat dog mouse apple tree I off"
Output := RegExReplace(RegExReplace(Str, " (?=\w{1,3}\b)", Chr(160)), "[" Chr(160) " \b]\w{1,3}\K ", Chr(160))
MsgBox, % Output

Here's a demonstration that uses underscores instead of non-breaking spaces so you can visually see the changes:

Code: Select all

Str := "cat dog mouse apple tree I off"
Output := RegExReplace(RegExReplace(Str, " (?=\w{1,3}\b)", "_"), "[_ \b]\w{1,3}\K ", "_")
MsgBox, % Output
Bachelar
Posts: 28
Joined: 26 Aug 2020, 12:30

Re: Regex finding whitespaces

08 Aug 2021, 07:40

OH MY GOD. It is perfect! I love it! Thank you so much. It's been driving me crazy lately. RegEx is absolutely amazing, but complicated af. I need to sit on it more.
Thank you, thank you.
teadrinker
Posts: 4347
Joined: 29 Mar 2015, 09:41
Contact:

Re: Regex finding whitespaces

08 Aug 2021, 08:41

I'd simplify a little:

Code: Select all

Str := "cat dog mouse apple tree I off"
MsgBox, % RegExReplace(Str, "(\b\w{1,3}\b)\K\s+|\s+(?=(?1))", "_")

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: No registered users and 250 guests