Creating a RegEx-script
Re: Creating a RegEx-script
That probably means the line breaks are made with CR+LF (`r`n), and you only replaced the LF characters.
Re: Creating a RegEx-script
Whole in one Thanks! Where do I learn about the differences in RegEx in SublimeText where the 'needle' is '\n' and autohotkey where I need '\r \n'?
Re: Creating a RegEx-script
It has more to do with Windows text files usually delimiting lines with CR+LF than either RegEx (AHK’s implementation or otherwise) or an editor (Sublime Text or other). It’s about the contents of the files that you are trying to match.
Having said that, read AHK’s RegRx documentation and other RegEx documentation in general about various options for matching those characters.
Having said that, read AHK’s RegRx documentation and other RegEx documentation in general about various options for matching those characters.
Re: Creating a RegEx-script
found this page (and bookmarked it)
https://www.autohotkey.com/docs/v2/misc/RegEx-QuickRef.htm
and didn't understand (at least) these lines:
Can I put them in a variable like so:
so I don't forget?
https://www.autohotkey.com/docs/v2/misc/RegEx-QuickRef.htm
and didn't understand (at least) these lines:
I don't understand why they seem to be 'escaped' with the "backtick" ("`")?`n Causes a solitary linefeed (`n) to be the only recognized newline marker (see above).
`r Causes a solitary carriage return (`r) to be the only recognized newline marker (see above).
Can I put them in a variable like so:
Code: Select all
myNewLine := '\r\n'
Re: Creating a RegEx-script
Reading educates
No variable needed after all...
So "`r`n" are options which preceed the RegEx and are separated by a closing bracket ")" from it.
Looking up is not the same as reading...
Code: Select all
\R matches `r`n
So "`r`n" are options which preceed the RegEx and are separated by a closing bracket ")" from it.
Looking up is not the same as reading...
Re: Creating a RegEx-script
The quest continues...
My second transformation is in the works, and just when I thought I was almost done, I ran into some major hiccups, so I decided to scale it back. Start from the beginning, so to speak.
This obviously is a basic step which I need to take: Select all the text and copy it to the clipboard. Now, I have two versions to report:
1. While line 2 is commented out, it'll take two runs to change the content of the clipboard (after the first run). The first round will give me the previously selected text, and only pressing F4 again will give me the correct text.
2. So I decided to "clear" the clipboard first (include line 2). Now I will only have "NOTHING IN HERE" in my message box (of course I've tried that with an empty string ("") also.
I've time and again stopped and started the stript, shuffled lines around, put'em in a function - to no avail.
Why after "^a" and "^c" is the clipboard empty?
Or remembers the selection before the current selection?
Am I missing a basic setting?
My second transformation is in the works, and just when I thought I was almost done, I ran into some major hiccups, so I decided to scale it back. Start from the beginning, so to speak.
Code: Select all
F4:: {
; A_Clipboard := "NOTHING IN HERE"
send '^a'
send '^c'
MsgBox(A_Clipboard, "In hotkey")
return
}
1. While line 2 is commented out, it'll take two runs to change the content of the clipboard (after the first run). The first round will give me the previously selected text, and only pressing F4 again will give me the correct text.
2. So I decided to "clear" the clipboard first (include line 2). Now I will only have "NOTHING IN HERE" in my message box (of course I've tried that with an empty string ("") also.
I've time and again stopped and started the stript, shuffled lines around, put'em in a function - to no avail.
Why after "^a" and "^c" is the clipboard empty?
Or remembers the selection before the current selection?
Am I missing a basic setting?
Re: Creating a RegEx-script
Thank you very much!
And I thought that was just ornament-... :/
Remembered your first script (above) which of course resides safely on my disc and copied as follows:
So I conclude:
- after "Send('^c')" it takes a while for the data to reach the clipboard?
- ClipWait takes care of that while
Will continue tomorrow...
ps I have questions about "strange behavior" when invoking a certain hotstring. Where do I ask that question? The hotstring
is connected to this "movieTransformationScript", but a hotstring is obviously somewhat different from a RegEx ...
Created a new topic: viewtopic.php?f=82&t=119269
And I thought that was just ornament-... :/
Remembered your first script (above) which of course resides safely on my disc and copied as follows:
Code: Select all
F4:: {
A_Clipboard := "", Send('^a'), Send('^c')
If ClipWait(1) {
; Send '^v'
MsgBox A_Clipboard
} Else MsgBox 'An error occurred while waiting for the clipboard.', 'Error', 48
SoundBeep 200
return
}
- after "Send('^c')" it takes a while for the data to reach the clipboard?
- ClipWait takes care of that while
Will continue tomorrow...
ps I have questions about "strange behavior" when invoking a certain hotstring. Where do I ask that question? The hotstring
Code: Select all
:*:#++:: (']], [[')
Created a new topic: viewtopic.php?f=82&t=119269
Last edited by ItisI on 10 Jul 2023, 13:00, edited 1 time in total.
Re: Creating a RegEx-script
Yes, clipboard can be slow.
Re: Creating a RegEx-script
"Stuck you are" (Yoda)
"Yes I am!"
Transformation 1 - 5 work perfectly. It's 77 lines of code by now, and I don't suppose you want to read all that ...
- needleRegEx, replacement, HayStack have been declared as STATIC (at the beginning of the script)
- the A_Clipboard has been filled savely with "ClipWait"
- HayStack is a copy of A_Clipboard and subsequently modified through steps 1- 5
Before Transformation 6 its contents are:
(Step 7 will be to replace ", " with "]], [[" and then the block will be finally ready for Obsidian)
This is step 6 - where I am stuck:
I have also (first) tried:
Both will give me (but work perfectly in VSCode, SublimeText, notepad++, surprisingly not in Brackets):
I've been looking up your Regular Expressions (RegEx) - Quick Reference: https://www.autohotkey.com/docs/v2/misc/RegEx-QuickRef.htm
- and found an option "\K" which didn't do anything
- scoured the net for information about the "(?<=...) - Lookbehind assertion" for possible syntax variants. No luck
Two steps from my happiness - two 'seemingly' minor transformations and now this.
Please help!
Thanks
"Yes I am!"
Transformation 1 - 5 work perfectly. It's 77 lines of code by now, and I don't suppose you want to read all that ...
- needleRegEx, replacement, HayStack have been declared as STATIC (at the beginning of the script)
- the A_Clipboard has been filled savely with "ClipWait"
- HayStack is a copy of A_Clipboard and subsequently modified through steps 1- 5
Before Transformation 6 its contents are:
What I want them to be after Transformation 6 is:Regie:: Freddie Francis
Buch:: John Sanson, Jon Mills, Edwin Golenbert
Produktion:: Ted Lloyd, Horst Wendlandt Co-Produzent
Musik:: Peter Thomas, Dorn Dollenberg
Kamera:: Denys Coop
Code: Select all
Regie:: [[Freddie Francis]]
Buch:: [[John Sanson, Jon Mills, Edwin Golenbert]]
Produktion:: [[Ted Lloyd, Horst Wendlandt Co-Produzent]]
Musik:: [[Peter Thomas, Dorn Dollenberg]]
Kamera:: [[Denys Coop]]
This is step 6 - where I am stuck:
Code: Select all
needleRegEx := '(:: )(.+$)'
replacement := '$1[[$2]]'
HayStack := RegExReplace(HayStack, needleRegEx, replacement)
MsgBox HayStack, "6. Transformation"
Code: Select all
needleRegEx := '(?<=::\s)(.+$)'
replacement := '[[$1]]'
Code: Select all
Regie:: Freddie Francis
Buch:: John Sanson, Jon Mills, Edwin Golenbert
Produktion:: Ted Lloyd, Horst Wendlandt Co-Produzent
Musik:: Peter Thomas, Dorn Dollenberg
Kamera:: [[Denys Coop]]
- and found an option "\K" which didn't do anything
- scoured the net for information about the "(?<=...) - Lookbehind assertion" for possible syntax variants. No luck
Two steps from my happiness - two 'seemingly' minor transformations and now this.
Please help!
Thanks
Re: Creating a RegEx-script
Code: Select all
#Requires AutoHotkey v2.0
str := '
(
Regie:: Freddie Francis
Buch:: John Sanson, Jon Mills, Edwin Golenbert
Produktion:: Ted Lloyd, Horst Wendlandt Co-Produzent
Musik:: Peter Thomas, Dorn Dollenberg
Kamera:: Denys Coop
)'
MsgBox RegExReplace(str, '::\h*\K.+', '[[$0]]')
Re: Creating a RegEx-script
@mikeyww
Thank you very much, mate! You are a lifesaver, well at least a time saver. I have just completed the 7. Transformation, and that is where I want to be.
I couldn't have done it without you. With your two interventions (first the cast list, now the crew block) you have turned a really tedious and unpleasant job (for hands and fingers) into a still boring but easily manageable task.
Here's a lot of beer waiting for you! Thank you very much.
However: I would never have developed this myself, and I ask you to please give my hurting ego some peace of mind by providing some explanation.
- With AHK we are in the Realm of Perl - and I better get used to it, right?
- My RegEx testers (VSCode, SublimeText, Brackets, notepad++) are nor reliable any more. Google lead me to
regex101: https://regex101.com/
which at least accepted your RegEx ('::\h*\K.+') - as did SublimeText
Your time and life saver:
- Line 1 uses only one (1) "'" - isn't that a bit under par and why doesn't it throw off the interpreter?
- Let my try and explain your needle:
\h - Any horizontal space. Are there "vertical spaces"? Why didn't my "\s" or " " work. Perl?
$0 - $0 is indeed the entire matched string (found on Google) So what about $1, $2 ... If something is "parially matched" where does the "entire match" come from? How does RegEx know, there could be an "entire match"?
Thanks again. A lot
Thank you very much, mate! You are a lifesaver, well at least a time saver. I have just completed the 7. Transformation, and that is where I want to be.
I couldn't have done it without you. With your two interventions (first the cast list, now the crew block) you have turned a really tedious and unpleasant job (for hands and fingers) into a still boring but easily manageable task.
Here's a lot of beer waiting for you! Thank you very much.
However: I would never have developed this myself, and I ask you to please give my hurting ego some peace of mind by providing some explanation.
- With AHK we are in the Realm of Perl - and I better get used to it, right?
- My RegEx testers (VSCode, SublimeText, Brackets, notepad++) are nor reliable any more. Google lead me to
regex101: https://regex101.com/
which at least accepted your RegEx ('::\h*\K.+') - as did SublimeText
Your time and life saver:
Code: Select all
str := '
(
Regie:: Freddie Francis
Buch:: John Sanson, Jon Mills, Edwin Golenbert
Produktion:: Ted Lloyd, Horst Wendlandt Co-Produzent
Musik:: Peter Thomas, Dorn Dollenberg
Kamera:: Denys Coop
)'
MsgBox RegExReplace(str, '::\h*\K.+', '[[$0]]')
- Let my try and explain your needle:
Found on regex101. Why did my lookbehind not work. Perl?\K - resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
\h - Any horizontal space. Are there "vertical spaces"? Why didn't my "\s" or " " work. Perl?
$0 - $0 is indeed the entire matched string (found on Google) So what about $1, $2 ... If something is "parially matched" where does the "entire match" come from? How does RegEx know, there could be an "entire match"?
Thanks again. A lot
Re: Creating a RegEx-script
I'm not very good with look-behind syntax. \K is easy because it looks for the preceding match but then also ignores it for any backreferences. AHK uses PCRE. \v is vertical space, \s is horizontal or vertical. Therein lies an important difference in some cases. Try:
This (m)) looks at each line individually, instead of the entire string as one. Documentation explains it.
Read: https://www.autohotkey.com/docs/v2/misc/RegEx-QuickRef.htm
and RegExReplace.
You may also want to find a more expanded tutorial of regular expressions, if you want to dive into the details.
Code: Select all
regex := 'm)(::\s)(.+$)'
MsgBox RegExReplace(str, regex, '$1[[$2]]')
Read: https://www.autohotkey.com/docs/v2/misc/RegEx-QuickRef.htm
and RegExReplace.
You may also want to find a more expanded tutorial of regular expressions, if you want to dive into the details.
Re: Creating a RegEx-script
This thread is not the appropriate place for you as the person asking for help to be posting (and adding to) a list of RegEx resources. And it should not turn into a thread about learning RegEx in general. Once your question that had you open the thread is satisfactorily answered, the thread should end.
Re: Creating a RegEx-script
@boiler OK, I understand and agree. Perhaps a moderator could move this last post to a more appropriate place? When I first started trying to learn RegEx, I would certainly have appreciated a list of suggestions on where to start and what tools to use. As I cannot contribute directly to AHK matters - I am a bleeding novice - I can still (possibly) help in other ways. One of them is to document my learning and the resources I discover.
There's one last post I'd like to add, and that's a summary (more or less) of the second ("Crew") transformation and the finished, cleaned-up script. (Cleaning up may unfortunately raise some new questions).
Please advise!
Thanks
There's one last post I'd like to add, and that's a summary (more or less) of the second ("Crew") transformation and the finished, cleaned-up script. (Cleaning up may unfortunately raise some new questions).
Please advise!
Thanks
Re: Creating a RegEx-script
Moved your post here.
I took out this part of the post:
I took out this part of the post:
If it has to do with wrapping up this topic, feel free to post it. If it opens up new questions, open new thread(s).
Re: Creating a RegEx-script
@boiler Very good place! Very good move! It'll be a matter of wrapping up this topic and cleaning up the code. At the moment, I don't expect any fundamentally new questions. But I'll need another day - it's much too hot here...
Re: Creating a RegEx-script
@mikeyww and @boiler were instrumental in the creation of this script. The phrase "They helped me create the script" does not reflect the true circumstances. There would be no script without them.
Intro:
A complete newcomer to AutoHotkey I am trying to build up a data sheet management for movies using Obsidian. I get the info from various sources in the internet and have to unify the format.
I. Cast - the list of actors and their roles
The cast of a movie/TV-show I usually get like this:
Code: Select all
Horst Tappert Horst Tappert ... Stephan Derrick
Fritz Wepper Fritz Wepper ... Harry Klein
Siegfried Lowitz Siegfried Lowitz ... August Bark
- removal of the duplicates
- linking of all actors to their respective data sheets
- linking of course is the enclosure of the actors name in square brackets "Wikilinks"
Code: Select all
[[Horst Tappert]] - Stephan Derrick
[[Fritz Wepper]] - Harry Klein
[[Siegfried Lowitz]] - August Bark
Code: Select all
#Requires AutoHotkey v2.0
F3:: { ; F3 = Transform clipboard text
Static needleRegEx := '(.+?)\t.+\t(.+)'
, replacement := '[[$1]] - $2'
A_Clipboard := '', Send('^c')
If ClipWait(1) {
A_Clipboard := RegExReplace(A_Clipboard, needleRegEx, replacement)
Send '^v'
} Else MsgBox 'An error occurred while waiting for the clipboard.', 'Error', 48
}
To make this procedure more reliable @boiler recommended to make use of the function ClipWait. (As seen in line 6).
Now it works perfectly!
II. Crew - Converting the Crew Block
---English - German terms
Director - Regie
Writer - Buch
Composer - Musik
Cinematographer - Kamera
---
The "Crew" I usually get in this format:
Code: Select all
Regie Freddie Francis
Drehbuch John Sanson, Jon Mills,
Edwin Golenbert
Produktion Ted Lloyd,
Horst Wendlandt Co-Produzent
Musik Peter Thomas
Dorn Dollenberg
Kamera Denys Coop
- Precise identification of the "inline data fields" "
- linking of the persons
- Minor point: "Drehbuch" -> "Buch"
like this:
Code: Select all
Regie:: [[Harald Reinl]]
Buch:: [[Johannes Kai]], [[Emil Durchwald]], [[Wilhelm Aggerdorn]]
Produktion:: [[Horst Wendlandt]], [[Preben Philipsen]]
Musik:: [[Martin Böttcher]]
Kamera:: [[Karl Löb]]
Now, here's the description of my approach:
Following the strict advice of my very late grandmother...
The Transformation takes place in 7. steps."It's OK to be stupid, but you must know how to help yourself!"
Preliminary:
Clipboard (A_Clipboard) assigned selected text (ClipWait!) and copied to the variable "Haystack" which is continuously transformed through 1 - 7. (see RegExReplace
Code: Select all
Replacing [space](possibly)[TAB](certain) with ":: "
Needed to identify the "inline data fields" for Obsidian.
needleRegEx := '\s?\t' ; variable containing the search string/RegEx
replacement := ':: ' ; variable containing the replacement
Code: Select all
Drehbuch -> Buch (minor point)
needleRegEx := 'Drehbuch'
replacement := 'Buch'
The challenge now is that there are possibly two person following a data field (separated by a ", ") or the persons are not in one line. one line (e.g. "(Dreh)buch").
Code: Select all
Drehbuch John Sanson, Jon Mills,
Edwin Golenbert
I remembered the olden days when we formatted txt-files imported into MS Word (If anyone wants to know the particulars, let me know...)
Code: Select all
Remove all the (superfluous) '`r`n' ("Hard Return") with an unlikely character combination "###"
No Markdown in those days...
needleRegEx := '`r`n'
replacement := '###'
Code: Select all
Regie:: Freddie Francis###Buch:: John Sanson, Jon Mills,###Edwin Golenbert###Produktion:: Ted Lloyd,###Horst Wendlandt Co-Produzent###Musik:: Peter Thomas###Dorn Dollenberg###Kamera:: Denys Coop
Now I need to get back my inline data fields, i.e. I want a Hard Return in front of every inline data field (except the first one).
Code: Select all
An inline data field is preceded by "###" and followed by ":: "
needleRegEx := '(###)(?=\w*:: )' ; will give me "###" in front of the inline data field
replacement := '`r`n'
Code: Select all
Regie:: Freddie Francis
Buch:: John Sanson, Jon Mills,###Edwin Golenbert
Produktion:: Ted Lloyd,###Horst Wendlandt Co-Produzent
Musik:: Peter Thomas###Dorn Dollenberg
Kamera:: Denys Coop
Need to turn those remaining ",###" or "###" (without the ",") into ", "
Code: Select all
needleRegEx := ',?###' ; "###" possibly preceded by 1 "," hence "?"
replacement := ', '
Code: Select all
Regie:: Freddie Francis
Buch:: John Sanson, Jon Mills, Edwin Golenbert
Produktion:: Ted Lloyd, Horst Wendlandt Co-Produzent
Musik:: Peter Thomas, Dorn Dollenberg
Kamera:: Denys Coop
Creating the "Wikilinks" for Obsidian. (@mikeyww - thanks again )
Code: Select all
Enclose everything after ":: " til the end of the line with "[[...]]"
needleRegEx := '::\h*\K.+'
replacement := '[[$0]]'
gives me:
Code: Select all
Regie:: [[Freddie Francis]]
Buch:: [[John Sanson, Jon Mills, Edwin Golenbert]]
Produktion:: [[Ted Lloyd, Horst Wendlandt Co-Produzent]]
Musik:: [[Peter Thomas, Dorn Dollenberg]]
Kamera:: [[Denys Coop]]
And lastly transforming those ", " into closing and opening square brackets, separated by a comma and a space -> "]], [["
Code: Select all
Transforming ", " into "]], [[" thus compleeting the intersticial links.
needleRegEx := ', '
replacement := ']], [['
Now the block is ready for Obsidian.
Code: Select all
Regie:: [[Freddie Francis]]
Buch:: [[John Sanson]], [[Jon Mills]], [[Edwin Golenbert]]
Produktion:: [[Ted Lloyd]], [[Horst Wendlandt Co-Produzent]]
Musik:: [[Peter Thomas]], [[Dorn Dollenberg]]
Kamera:: [[Denys Coop]]
2. See if I already have a note (data sheet) on a person (if yes, it'll be highlighted)
3. Create a new sheet by simply clicking on the name
Don't forget: no script without @mikeyww and @boiler .
Here's the code of the script. It is my very first one. I am not a coder. I get along with spreadsheet and audio tag functions and some RegEx. (The script won't be useful to anyone who hasn't got the same pet project as I do. Sorry.)
Code: Select all
F4:: {
; variables
Static HayStack := '' ; current string to be manipulated
Static needleRegEx := '' ; variable containing the search expression
Static replacement := '' ; variable containing the replace expression
; A_Clipboard := "" ; , Send('^a'), Send('^c') moved to SelectAllCopy()
SelectAllCopy()
If ClipWait(1) {
/*--------------------------------------------------------------
1. transformation: [SPACE, possibly?][TAB] -> ":: "
The ":: " are inline-field delimiters for Obsidian
--------------------------------------------------------------*/
HayStack := A_Clipboard
needleRegEx := '\s?\t'
replacement := ':: '
HayStack := RegExReplace(HayStack, needleRegEx, replacement)
/*--------------------------------------------------------------
2. transformation [Drehbuch] -> "Buch"
--------------------------------------------------------------*/
needleRegEx := 'Drehbuch'
replacement := 'Buch'
HayStack := RegExReplace(HayStack, needleRegEx, replacement)
/*--------------------------------------------------------------
3. transformation: replace all ['`r`n'] with "###"
--------------------------------------------------------------*/
needleRegEx := '`r`n'
replacement := '###'
HayStack := RegExReplace(HayStack, needleRegEx, replacement)
/*--------------------------------------------------------------
4. transformation
Insert \R ('`r`n') exaactly before the inline-field titles (Regie, Buch,...)
--------------------------------------------------------------*/
needleRegEx := '(###)(?=\w*:: )'
replacement := '`r`n'
HayStack := RegExReplace(HayStack, needleRegEx, replacement)
/*--------------------------------------------------------------
; 5. transformation
; Replace remaining (',###') or ('###') with ', '
--------------------------------------------------------------*/
needleRegEx := ',?###'
replacement := ', '
HayStack := RegExReplace(HayStack, needleRegEx, replacement)
/*--------------------------------------------------------------
6. transformation
Enclose all text from ":: " to the end of line in
needleRegEx := '(?<=::\s)(.+$)'
replacement ("[[...]]")
My regexes did not work (PCRE!) :()
Special thanks to @mickeyww
--------------------------------------------------------------*/
needleRegEx := '::\h*\K.+'
replacement := '[[$0]]'
; SaveInsert(HayStack)
HayStack := RegExReplace(HayStack, needleRegEx, replacement)
/*--------------------------------------------------------------
7. transformation
Replace the remaining ", " with "]], [["
Now all the Wikilinks for Obsidian are set neatly and
the transformed text is also in the Clipboard
--------------------------------------------------------------*/
needleRegEx := ', '
replacement := ']], [['
HayStack := RegExReplace(HayStack, needleRegEx, replacement)
; Savely assign 'HayStack' to A_Clipboard
SaveInsert(HayStack)
;
Send ('^a')
Send ('^v')
} Else MsgBox 'An error occurred while waiting for the clipboard.', 'Error', 48
mySignal()
return
}
mySignal() {
SoundBeep(440, 300)
SoundBeep(220, 600)
return
}
SaveInsert(myText) {
If ClipWait(2) {
A_Clipboard := myText
return
} Else MsgBox 'An error occurred while waiting for the clipboard.', 'Error', 48
mySignal()
return
}
SelectAllCopy() {
A_Clipboard := ''
Send('^a')
Send('^c')
}
When we look at the code, we see a lot of repetition - this is crying out to be moved into functions. Also, the 14 search/replace parameters could easily be moved into one 2-dimensional or two 1-dimensional array(s). Now put the whole thing into a "for - next loop" and the whole routine would be easily adaptable to any number of searches and replacements (within reason). Yes, and then the dreaming starts: a GUI to query the parameters, an option to save and load the same, the correct prediction of all stock prices ...
But then reality sets in: I have only the foggiest idea about variables, types, scope, arrays in AHK and will only slowly approach these ideas.
Thank you all for your patience and help!
Here's a growing Collection of RegEx resources
Re: Creating a RegEx-script
Glad it worked out. It looks to me like you are a coder, so you should probably just get over it.
Re: Creating a RegEx-script
Thank you
Who is online
Users browsing this forum: Draken and 49 guests