RegExMatch to get the position of the last occurrence of the "needle"

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

RegExMatch to get the position of the last occurrence of the "needle"

23 Sep 2021, 20:24

Does anyone know of a way to find the LAST occurrence of the needle in the haystack? I thought RegExMatch would be the ticket, as I'm hunting for a pattern: period+whitespace, question mark+whitespace, or exclamation mark+whitespace.

Finding the first match was relatively straightforward -- although I confess that poured over RegExMatch documentation for longer than I care to admit!

Code: Select all

Top := "Hi? Yes! No? "
Msgbox % RegExMatch(Top, "[\.!?]\s")		; Returns 3
But identifying the last match is stumping me. The result should be 11.

In the absence of a clever solution, my thought on cracking the puzzle is to reverse the string, and then use RegExMatch to find "\s[\.!?]" instead.
User avatar
boiler
Posts: 16913
Joined: 21 Dec 2014, 02:44

Re: RegExMatch to get the position of the last occurrence of the "needle"

23 Sep 2021, 20:42

The negative look-ahead in the code below is what makes it find the last occurrence. I added text on the end of the example haystack to prove it will find it even if there is stuff after the last occurrence. Otherwise a simple $ anchor would have done the job.

Code: Select all

Top := "Hi? Yes! No? and more text"
MsgBox % RegExMatch(Top, "[\.!?]\s(?!.*[\.!?]\s)")

By the way, the position of the last occurrence in your example is actually 12, not 11.
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

Re: RegExMatch to get the position of the last occurrence of the "needle"

24 Sep 2021, 08:10

Thank you, Broiler.

I will need to study the look-ahead concept, both positive and negative, to understand how your code works! But I'm glad to have it.
User avatar
boiler
Posts: 16913
Joined: 21 Dec 2014, 02:44

Re: RegExMatch to get the position of the last occurrence of the "needle"

24 Sep 2021, 10:30

You’re welcome. They are well worth studying for a myriad of uses. In this particular case, the logic is that by following the search pattern with a negative look-ahead of the exact same search pattern (along with .* to allow for other characters in between), it is finding the desired pattern that cannot be followed by the desired pattern, making it find the last instance.
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

Re: RegExMatch to get the position of the last occurrence of the "needle"

24 Sep 2021, 10:34

That's a very helpful explanation! Thank you again.
braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExMatch to get the position of the last occurrence of the "needle"

24 Sep 2021, 12:58

Easier than with a look ahead:

Code: Select all

regexmatch(Top, ".*\K[\.!?]\s")
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

Re: RegExMatch to get the position of the last occurrence of the "needle"

30 Sep 2021, 10:21

That's also a good way, braunbaer! The two methods seem to work equally well. Thank you!

As I continue to refine and experiment, I'm (again) reaching the limits of my ability to sort things out with RegExMatch.

My immediate goal is to move the cursor left to the start of the current sentence. I assume all sentences end in a period, exclamation mark, or a question mark. The sentence might appear at the start of a paragraph, in the middle of a paragraph, or at the end of a paragraph. For example, here are two paragraphs I'm using for testing.
Paragraph one. Hello! Are you there? I am wherever.

Paragraph two. Goodbye! Were you there? I am whatever.
If the cursor is within the sentence "Are you there?" the cursor ends up to the left of the "A" in "Are". If the cursor is within the sentence "Goodbye!" the cursor ends up to the left of "G" in "Goodbye".

My approach is to hunt for text that precedes the start of the sentence, which is a period, question mark, or exclamation mark, followed by a whitespace character.

Here's the logic I'm trying to encode.

0. Manually place the cursor in the middle of a sentence.

1. Select left to the start of the document or field: Shift + Ctrl + Home. (I would prefer to select to the start of the paragraph with Shift + Ctrl + Up, but this hotkey is not supported in Notepad or in some text entry fields.)

2. Copy to clipboard, and place in a variable.

3. Deselect by pressing Right arrow. The cursor is in the same location as in Step 0.

4. Get the length of the variable.

5. Get the position of the end of the previous sentence.

6. Subtract one from the other.

7. Move the cursor left that amount.

The problem: The script works with sentences in the first paragraph, but not for sentences in the second paragraph (or in subsequent paragraphs, if any). Instead, the cursor ends up somewhere near the end of the first paragraph.

My guess is that carriage returns are messing with RegexMatch's ability to identify the correct match, although I don't understand why this would happen.

Is there a way to adjust the statement (or my script!) so that the end-of-sentence match is found regardless of which paragraph is in?

Code: Select all

!6::

Clipboard := ""
SendInput ^+{Home}^c
ClipWait 1
SendInput {Right}
Top := Clipboard
Sleep, 100

;	TopPos := RegExMatch(Top, "[\.!?]\s(?!.*[\.!?]\s)")   ; Both versions of this statement work equally well
 	TopPos := RegExMatch(Top, ".*\K[\.!?]\s")

	TopLength := StrLen(Top)
	Offset := TopLength - TopPos

SendInput {Left %Offset%}
Return

braunbaer
Posts: 478
Joined: 22 Feb 2016, 10:49

Re: RegExMatch to get the position of the last occurrence of the "needle"

05 Oct 2021, 17:05

7. Move the cursor left that amount.
I assume that is the problem. The "left" key moves the cursor from the beginning of a line to the end of the previous line, but in most cases, line separators consist of the two characters "CR" and "LF". So you should subtract 1 from the character count for every newline between the cursor position and the start of the paragraph.
User avatar
flyingDman
Posts: 2817
Joined: 29 Sep 2013, 19:01

Re: RegExMatch to get the position of the last occurrence of the "needle"

05 Oct 2021, 21:50

My immediate goal is to move the cursor left to the start of the current sentence
What is your end-goal? Just asking because, pushing the cursor till it hits a ?. or !, does not seem like your final destination.
14.3 & 1.3.7
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

Re: RegExMatch to get the position of the last occurrence of the "needle"

08 Oct 2021, 23:03

My final goal is to select the entire sentence that contains the cursor. My immediate goal, however, is more modest: move the cursor to the start of the current sentence.

My strategy is to find the position of the last occurrence of [period OR question mark OR exclamation mark] followed by one (or more) white spaces.

The following code works when the cursor is within the first paragraph of a document, but fails when the cursor is within the second or subsequent paragraph.

Code: Select all

!0::

Clipboard := ""
SendInput +^{Home}
SendInput ^c
ClipWait 1.0
SendInput {Right}
Top := Clipboard

Pos := RegExMatch(Top, ".*\K[\.!?]\s+")

TopLen := StrLen(Top)
Offset := TopLen - Pos

SendInput {Left %Offset%}

return

I don't understand why \s doesn't match the carriage returns or new lines between the paragraphs. Yet AHK's documentation says it does:
Matches any single whitespace character, mainly space, tab, and newline (`r and `n).
just me
Posts: 9450
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: RegExMatch to get the position of the last occurrence of the "needle"

09 Oct 2021, 03:22

Your problem is the dot (.):
By default, a dot matches any single character except `r in a newline (`r`n) sequence, but this can be changed by using the DotAll (s), linefeed (`n), carriage return (`r), `a or (*ANYCRLF) options. For example, ab. matches abc and abz and ab_.

Code: Select all

Pos := RegExMatch(Top, "s).*\K[\.!?]\s+")
User avatar
flyingDman
Posts: 2817
Joined: 29 Sep 2013, 19:01

Re: RegExMatch to get the position of the last occurrence of the "needle"

09 Oct 2021, 09:40

My final goal is to select the entire sentence that contains the cursor.
and then what?
14.3 & 1.3.7
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

Re: RegExMatch to get the position of the last occurrence of the "needle"

09 Oct 2021, 11:04

That's the goal: a script that selects the current sentence, in any application or context, without the need to manually find the start and end of the sentence.

I'm aiming for a script that does something similar to Select Mode in Microsoft Word: pressing F8 three times selects the current sentence.

As a writer, selecting sentences is something I do dozens of times a day. I plan to bind the macro to a single hotkey.

I've already built a prototype that works, but it's clunky. I'm hoping for a cleverer, more elegant RegExMatch solution.
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

Re: RegExMatch to get the position of the last occurrence of the "needle"

09 Oct 2021, 16:38

I think I'm starting to zero in on a RegExMatch solution. But to make it work, I may need the length of the match.
If a capital P is present in the RegEx's options -- such as P)abc.*123 -- the length of the entire-pattern match is stored in OutputVar (or 0 if no match). If any capturing subpatterns are present, their positions and lengths are stored in two pseudo-arrays: OutputVarPos and OutputVarLen. For example, if the variable's base name is Match, the one-based position of the first subpattern's match would be stored in MatchPos1, and its length in MatchLen1


So if that's the case, why don't I obtain the length of the match when I do this?

Code: Select all

Pos := RegExMatch(Top, "sP)..*\K[\.!?].*\w", Match)
Msgbox, Length = %MatchLen1%
User avatar
boiler
Posts: 16913
Joined: 21 Dec 2014, 02:44

Re: RegExMatch to get the position of the last occurrence of the "needle"

09 Oct 2021, 17:47

alancantor wrote: So if that's the case, why don't I obtain the length of the match when I do this?

Code: Select all

Pos := RegExMatch(Top, "sP)..*\K[\.!?].*\w", Match)
Msgbox, Length = %MatchLen1%
It’s because you don’t have any capturing subpatterns defined in your needle (i.e., you don’t have a Match1, therefore there is no MatchLen1). You just have an overall match, and you can get its length without using the P option the usual way:

Code: Select all

Length := StrLen(Match)
Or you can use the P option and the output variable (Match in your case) contains its length as the document excerpt you quoted mentions.
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

Re: RegExMatch to get the position of the last occurrence of the "needle"

09 Oct 2021, 18:47

Thank you, Boiler. Once again, you're stretching my knowledge of AHK!

Is there a way to capture the match in a variable? For example, if the matching text is "! Hello", can I place that text in a variable?
User avatar
flyingDman
Posts: 2817
Joined: 29 Sep 2013, 19:01

Re: RegExMatch to get the position of the last occurrence of the "needle"

09 Oct 2021, 19:07

Try this. The script first determines the location of the cursor (pos) in your editor. Then, it creates a list (lst) of the location of the punctuation [.!?]. The start and end positions of the sentence to be selected are the values in the list that are just above and just below the hypothetical position of the cursor in that list. The cursor is then moved to the start {right %vmin%} and to the end {shift down}{right %dif%} of the sentence. A lot of simulated keystrokes ...

Code: Select all

#\::
clipboard := ""
send ^+{home}^c{right}
clipwait, 1
pos := strlen(regexreplace(clipboard,"\n"))
clipboard := "", lst := ""
send ^a^c
clipwait, 1
var := regexreplace(clipboard,"\R"," ")
Count := 0, Strt := 1, lst := 0 "`n"
while loc := RegExMatch(Var, "[\.!?]+", Mtch, Strt) {
	lst .= (Strt := loc + StrLen(Mtch)) - 1 "`n", Count++
}
vmin := 0, vmax := 999999999
for x,y in strsplit(trim(lst,"`n"),"`n")
	vmin:=y>pos?vmin:max(vmin,y), vmax:=y<=pos?vmax:min(vmax,y)
dif := vmax - vmin 
send ^{home}{right %vmin%}{shift down}{right %dif%}{shift up}
return

It works in Scite and in Notepad and should work in other editors as well.
14.3 & 1.3.7
alancantor
Posts: 76
Joined: 11 Jun 2019, 11:28

Re: RegExMatch to get the position of the last occurrence of the "needle"

10 Oct 2021, 12:25

Once I have a reliable way to move the cursor to the start of the current sentence, I anticipate it will be easier to move the cursor to the end of the sentence. And at that point, I planned to start a new thread on scripts for selecting sentences.

It's a topic that's been discussed earlier on this forum. I'm hoping a time is coming when the topic will be revisited.
User avatar
flyingDman
Posts: 2817
Joined: 29 Sep 2013, 19:01

Re: RegExMatch to get the position of the last occurrence of the "needle"

10 Oct 2021, 13:59

Here is a faster version:

Code: Select all

#\::
clipboard := ""
send ^+{home}^c{right}
clipwait, 1
arr := strsplit(strreplace(clipboard,"`n"), [".","!","?"])
len := strlen(arr[arr.count()])
send, {left %len%}

clipboard := ""
send ^+{end}^c{left}
clipwait, 1
arr := strsplit(strreplace(clipboard,"`n"), [".","!","?"])
len := strlen(arr[1]) + 1
send, {shift down}{right %len%}{shift up}
return
14.3 & 1.3.7

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: roeleboele and 364 guests