Hi, y'all!
What I'm searching for is help in RegEx. I've got some string constisting of text and I want to find all whitespaces that are around 1-, 2- and 3-letter words. After a couple of days of trying I know how to find all spaces, how to find words I need and how to find spaces and said words. By now I just wonder if what I want is even possible.
Any advice is most welcomed.
Regex finding whitespaces Topic is solved
-
- Posts: 1472
- Joined: 05 May 2018, 12:23
Re: Regex finding whitespaces
post a string example and your desired output, there are regex pros here that are creative and helpful.
Re: Regex finding whitespaces
Well, as for the example string it could be like this: cat dog mouse apple tree I off. It's not really specified. And what I would want to have as output are spaces around cat dog I off, i.e. spaces around 1-, 2- or 3-letter words.
Re: Regex finding whitespaces Topic is solved
A potential problem with keeping all the spaces found around those words in the output is that you would often have double spaces between those words and spaces at the beginning and/or end of the string. Do you really just want all those words in a list separated by one space? This script produces that output:
Code: Select all
Str := "cat dog mouse apple tree I off"
Output := Trim(RegExReplace(RegExReplace(Str, "\w{4,}"), "\s{2,}", " "))
MsgBox, % Output
Re: Regex finding whitespaces
Yep, that's exactly what I want. It doesn't really matter if words in a list are separated by one, two or more spaces. I just needed a way to get that list of certain words and I think I can cope with next steps.boiler wrote: ↑05 Aug 2021, 05:22A potential problem with keeping all the spaces found around those words in the output is that you would often have double spaces between those words and spaces at the beginning and/or end of the string. Do you really just want all those words in a list separated by one space? This script produces that output:Code: Select all
Str := "cat dog mouse apple tree I off" Output := Trim(RegExReplace(RegExReplace(Str, "\w{4,}"), "\s{2,}", " ")) MsgBox, % Output
Thank you so much.
Re: Regex finding whitespaces
I see. If the number of spaces around the words doesn’t matter, it can be reduced to this, which just does the main action of removing all words that are 4 characters or longer:
Code: Select all
Output := RegExReplace(Str, "\w{4,}")
Re: Regex finding whitespaces
Yeah, I thought about that and tried that. Now onto the next steps. What I needed this RegEx for is to replace whitespace(-s) between 1-, 2- or 3-letter words with non-breaking space, so I could format very long texts (i.e. so there would be no 1-, 2- or 3-letter words at the end of the line). I am aware of the MS Word's find & replace tool and its syntax and I was wondering whether it could be done directly with AHK RegEx (or COM Object? )boiler wrote: ↑06 Aug 2021, 06:24I see. If the number of spaces around the words doesn’t matter, it can be reduced to this, which just does the main action of removing all words that are 4 characters or longer:Code: Select all
Output := RegExReplace(Str, "\w{4,}")
Word syntax is as follows: in the find box (<[a-z]{1;3}>) (there's compulsory trailing space) and \1^s in replace with.
Re: Regex finding whitespaces
Yes, it can be done directly in the AHK statement. In the code below, the Trim first removes any spaces at the very beginning or end of the string, which would occur if the list started or ended with longer words, then it replaces all groups of spaces (one space or more) with a non-breaking space character.
Of course, you can't tell from the MsgBox, but if you copy its contents and paste into an editor where you can study the characters, you'll see the words are all separated by a single non-breaking space.
Code: Select all
Str := "cat dog mouse apple tree I off"
Output := RegExReplace(Trim(RegExReplace(Str, "\w{4,}")), "\s+", Chr(160))
MsgBox, % Output
Re: Regex finding whitespaces
Yeah, it's nice, but still not quite what I would want. The output consists of words and spaces and I want only spaces around 1-, 2- or 3- letter words, so I could substitute them easily. If that's even possible. That or I don't really grasp your tips.boiler wrote: ↑06 Aug 2021, 07:41Code: Select all
Str := "cat dog mouse apple tree I off" Output := RegExReplace(Trim(RegExReplace(Str, "\w{4,}")), "\s+", Chr(160)) MsgBox, % Output
Re: Regex finding whitespaces
OK. I think I understand what you want. In your example of cat dog mouse apple tree I off, when you said you wanted an output of spaces around cat dog I off, you didn't mean that was the actual output. You meant an output of cat_dog_mouse apple tree_I_off, where the _ represents a non-breaking space. So this should do it:
Here's a demonstration that uses underscores instead of non-breaking spaces so you can visually see the changes:
Code: Select all
Str := "cat dog mouse apple tree I off"
Output := RegExReplace(RegExReplace(Str, " (?=\w{1,3}\b)", Chr(160)), "[" Chr(160) " \b]\w{1,3}\K ", Chr(160))
MsgBox, % Output
Here's a demonstration that uses underscores instead of non-breaking spaces so you can visually see the changes:
Code: Select all
Str := "cat dog mouse apple tree I off"
Output := RegExReplace(RegExReplace(Str, " (?=\w{1,3}\b)", "_"), "[_ \b]\w{1,3}\K ", "_")
MsgBox, % Output
Re: Regex finding whitespaces
OH MY GOD. It is perfect! I love it! Thank you so much. It's been driving me crazy lately. RegEx is absolutely amazing, but complicated af. I need to sit on it more.
Thank you, thank you.
Thank you, thank you.
-
- Posts: 4347
- Joined: 29 Mar 2015, 09:41
- Contact:
Re: Regex finding whitespaces
I'd simplify a little:
Code: Select all
Str := "cat dog mouse apple tree I off"
MsgBox, % RegExReplace(Str, "(\b\w{1,3}\b)\K\s+|\s+(?=(?1))", "_")
Who is online
Users browsing this forum: No registered users and 250 guests