RegExReplace More than One Needle in a Script?

Get help with using AutoHotkey and its commands and hotkeys
Doc_B
Posts: 11
Joined: 19 May 2017, 16:00

RegExReplace More than One Needle in a Script?

30 Jul 2017, 12:40

Greetings,

I've written a script that identifies the following three-digit range of numbers with a "1" in the hundreds place:

142-167

and drops the hundred digit after the hyphen or en dash:

142-67

The script reads as follows:

F1 & n::
Clipboard =
SendInput, ^c
ClipWait
haystack := Clipboard
needle := "1([0-9][0-9])[--]1([0-9][0-9])"
replacement = 1$1–$2
result := RegExReplace(haystack, needle, replacement)
Clipboard =
Clipboard := result
ClipWait
SendInput, %Clipboard%

It works. How do I write the script so that it checks for all three-digit numbers with any number in the hundreds place? For example,

needle := "2([0-9][0-9])[--]2([0-9][0-9])"
replacement = 2$1–$2

and so on, without writing a separate script for each check?

Best,
Doc B
User avatar
AlphaBravo
Posts: 469
Joined: 29 Sep 2013, 22:59

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 13:04

RegExReplace(haystack, "\b(\d)(\d\d)-(?1)(\d\d)\b", "$1$2-$3")
Doc_B
Posts: 11
Joined: 19 May 2017, 16:00

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 13:23

AlphaBravo wrote:RegExReplace(haystack, "\b(\d)(\d\d)-(?1)(\d\d)\b", "$1$2-$3")
That's excellent. I tested it, and it works. I tried something similar, but there's an exception to the rule--before I say what it is, you must know that I wasn't laying a trap! I was trying to keep my first post succinct.

The exception is that numbers that have different numerals in the hundred place must remain. So,

487-583

must remain so and not change to

487-83
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 13:34

@AlphaBravo: what is intended by '(?1)', did you mean for it to equal '(\d)'?

Code: Select all

q::
vText := "123-123 123-234 234-234"
MsgBox, % RegExReplace(vText, "\b(\d)(\d\d)-(?1)(\d\d)\b", "$1$2-$3")
return
;gives: 123-23 123-34 234-34
;not: 123-23 123-234 234-34
==================================================

From what you have described, I believe that you want to do this:
1xx-1xx -> 1xx-xx
2xx-2xx -> 2xx-xx
3xx-3xx -> 3xx-xx
Which is interesting. Not knowing the reason, page numbers is one possibility.

This may do what you want, although of course, you should be aware that it might also replace other text that you didn't intend to replace:

Code: Select all

q::
vText := "142-167"
Loop, 10
{
	vIndex := A_Index-1
	vText := RegExReplace(vText, "\b(" vIndex "\d\d-)" vIndex "(\d\d)\b", "$1$2")
}
MsgBox, % vText
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Doc_B
Posts: 11
Joined: 19 May 2017, 16:00

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 14:00

jeeswg wrote:From what you have described, I believe that you want to do this:
1xx-1xx -> 1xx-xx
2xx-2xx -> 2xx-xx
3xx-3xx -> 3xx-xx
Which is interesting. Not knowing the reason, page numbers is one possibility.
''

Yes, those numbers indicate page numbers or "locators." The rules appertain to The Chicago Manual of Style. I'm a professional editor. I've learned to use wildcards to make many corrections using Microsoft Word or various software programs like Perfect3 or the Editorium. However, each program has a limitation: none can help me to make category-specific corrections while editing. Each program requires me to start it, run through the scripts, and eyeball the proposed corrections, classifying kinds of sources, and then accepting or rejecting various corrections. Sometimes, even within a given documentation style, the rules contradict one another from source type to source type.

If I were able to write a script for one kind of citation in a given documentation style (a journal article according to Chicago 16), then I could set up hotkeys for citations of various kinds. Thus, while editing, I could eyeball a book, a journal article, a multivolume work, a website, etc., highlight the entry, and run a script that conforms it to the rules of the documentation style.

I've had some success with string replaces. Yesterday, I researched for RegExReplace for six hours while editing for a client. I'm unfamiliar with how to make the "vText" script work, but I'll research the terms this afternoon and try to reply.

Thanks!
User avatar
AlphaBravo
Posts: 469
Joined: 29 Sep 2013, 22:59

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 14:52

jeeswg wrote:@AlphaBravo: what is intended by '(?1)', did you mean for it to equal '(\d)'?
call a numbered group, whatever the pattern captured in group number 1 "(\d)" in this case, apply back here "(?1)", change it to "(?2)" for capturing group #2 and so on.

Code: Select all

\b(\d)(\d\d)-(?1)(\d\d)\b
    ^          ^
Doc_B
Posts: 11
Joined: 19 May 2017, 16:00

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 15:36

AlphaBravo wrote:RegExReplace(haystack, "\b(\d)(\d\d)-(?1)(\d\d)\b", "$1$2-$3")
By the way, thanks!
Guest

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 16:15

I think this says
if first backref (\1) is != to first capture group .., delete next digit after - , by replacement

Code: Select all

h := "242-367 242-267"
h:=RegExReplace(h, "(\d)(\d\d)-(?(?!(\1)\d\d)|\d)", "$1$2-")
MsgBox,% h
Doc_B
Posts: 11
Joined: 19 May 2017, 16:00

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 19:37

I spent the rest of the day tidying up and commenting on some short scripts I had written to perform various editing tasks. The answers on this thread require me both to think through the operations and to research further. I favorited this thread and subscribed to it, with the hope of replying to it after I get up to speed. A few months ago, RegExReplace was unknown to me, but now I've written some scripts using it. Perhaps my knowledge of the above permutations will improve, too.
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: RegExReplace More than One Needle in a Script?

30 Jul 2017, 22:14

This has been quite interesting, I've seen here 2 backreferencing techniques I didn't know about: \1 and ?1, both similar but subtly different.

I believe that:
\1 matches the earlier literal string, and is mentioned here:
RegExMatch
https://autohotkey.com/docs/commands/RegExMatch.htm
?1 matches the earlier *RegEx needle*, so would be undesirable for this particular problem, and it is not mentioned in the AHK documentation AFAIK. (RegEx is so vast, that the AHK documentation doesn't document everything that you can do with it.)

If the first item contained in parentheses is \d:
e.g. if (\d) matches 3, then \1 must be 3
e.g. if (\d) matches 3, then ?1 must be \d (any digit)

This suggests that ?1's main use might be for repeating a long part of a RegEx needle.

Code: Select all

q::
vList := "342-367,342-967"

;pcresyntax specification
;http://www.pcre.org/original/doc/html/pcresyntax.html#SEC20
;  \n              reference by number (can be ambiguous)
;  (?n)            call subpattern by absolute number

;simple backreference examples (literal/hardcoded backreference):
vText := "abcde aabbccddee aaabbbcccdddeee"
MsgBox, % RegExReplace(vText, "(.)(\1)")
vText := "abcde aabbccddee aaabbbcccdddeee"
MsgBox, % RegExReplace(vText, "(a)(\1)")
vText := "abcde aabbccddee aaabbbcccdddeee"
MsgBox, % RegExReplace(vText, "([a-b])(\1)")
vText := "abcde aabbccddee aaabbbcccdddeee"
MsgBox, % RegExReplace(vText, "([a-c])(\1)")

;simple backreference examples (dynamic/RegEx backreference):
vText := "abcde aabbccddee aaabbbcccdddeee"
MsgBox, % RegExReplace(vText, "(.)(?1)")
vText := "abcde aabbccddee aaabbbcccdddeee"
MsgBox, % RegExReplace(vText, "(a)(?1)")
vText := "abcde aabbccddee aaabbbcccdddeee"
MsgBox, % RegExReplace(vText, "([a-b])(?1)")
vText := "abcde aabbccddee aaabbbcccdddeee"
MsgBox, % RegExReplace(vText, "([a-c])(?1)")

Loop, Parse, vList, % "," ;doesn't work because it applies the needle '\d' again rather than the literal text found by the earlier '\d'
{
	vText := A_LoopField
	MsgBox, % vText "`r`n" RegExReplace(vText, "\b(\d)(\d\d)-(?1)(\d\d)\b", "$1$2-$3")
}
Loop, Parse, vList, % "," ;works
{
	vText := A_LoopField
	MsgBox, % vText "`r`n" RegExReplace(vText, "\b(\d)(\d\d)-(\1)(\d\d)\b", "$1$2-$4")
}
Loop, Parse, vList, % "," ;works
{
	vText := A_LoopField
	MsgBox, % vText "`r`n" RegExReplace(vText, "\b(\d)(\d\d)-(?:\1)(\d\d)\b", "$1$2-$3")
}
return
Last edited by jeeswg on 31 Jul 2017, 01:01, edited 1 time in total.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
AlphaBravo
Posts: 469
Joined: 29 Sep 2013, 22:59

Re: RegExReplace More than One Needle in a Script?

31 Jul 2017, 00:34

jeeswg wrote:I believe that:
\1 matches the earlier literal string, and is mentioned here:
RegExMatch
https://autohotkey.com/docs/commands/RegExMatch.htm
?1 matches the earlier *RegEx needle*, so would be undesirable for this particular problem, and it is not mentioned in the AHK documentation AFAIK.
you're absolutley right, I obviously goofed up and used "(?!)" instead of the correct backrefrence "\1"
User avatar
jeeswg
Posts: 6904
Joined: 19 Dec 2016, 01:58
Location: UK

Re: RegExReplace More than One Needle in a Script?

31 Jul 2017, 00:58

@AlphaBravo: Haha well you're clearly a bit of a RegEx master, if there's anything good that's missing from my tutorial or the AHK documentation you'd be most welcome to post some info/links here:
jeeswg's RegEx tutorial (RegExMatch, RegExReplace) - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=7&t=28031

I'll add in some comments about backreferencing within needles, at some point.

@Doc_B: I'm currently looking into proofreading/editing books(/websites) for general English e.g. essays/journalism, i.e. grammar etc, in case you have (or anybody else has) any recommendations. Just thought I'd mention. Cheers.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA

Return to “Ask For Help”

Who is online

Users browsing this forum: au6, BushMange, CEA6597, howardb1, MannyKSoSo, VACO BenQ, w0z and 185 guests