Extract text after a pattern in a text string

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
xxx59712
Posts: 4
Joined: 16 May 2022, 06:35

Extract text after a pattern in a text string

Post by xxx59712 » 16 May 2022, 06:47

I have the following text sample

Code: Select all

01. since the 1500s, when an
02. opposed to using 'Content here
03. and a search for 'lorem ipsum'
20 - randomised words which don't look even slightly believable
19 - going through the cites of the word
01. Finibus Bonorum et Malorum 02. essentially unchanged it was popularised in 03. The first line of Lorem Ipsum 04. predefined chunks as necessary
I need to extract the text excluding numbers ("02." or "20 -") and store each sentence in the part of an array. So the array would be Variable(0) = "since the 1500s, when an",
Variable(1) = "opposed to using 'Content here" and so forth.

I managed to write what I think would be the regex to find the separating pattern

Code: Select all

 \d\d(\.| -)
but I do not manage to come up with the function to use and the loop
to achieve that. Can you somebody please help me?

Thank you.

User avatar
mikeyww
Posts: 27107
Joined: 09 Sep 2014, 18:38

Re: Extract text after a pattern in a text string

Post by mikeyww » 16 May 2022, 07:06

AHK simple arrays start with a default key of 1. This typically makes working with simple arrays straightforward in AHK.

Code: Select all

str =
(
01. since the 1500s, when an
02. opposed to using 'Content here
03. and a search for 'lorem ipsum'
20 - randomised words which don't look even slightly believable
19 - going through the cites of the word
01. Finibus Bonorum et Malorum 02. essentially unchanged it was popularised in 03. The first line of Lorem Ipsum 04. predefined chunks as necessary
)
text := []
For each, line in StrSplit(str, "`n", "`r")
 text.Push(RegExReplace(line, "^.*?[.-]\h*"))
For lineNum, line in text
 MsgBox, 0, Line #%lineNum%, %line%
Explained: Simple arrays

xxx59712
Posts: 4
Joined: 16 May 2022, 06:35

Re: Extract text after a pattern in a text string

Post by xxx59712 » 16 May 2022, 08:54

Thank you very much, we are almost there.

What would be the function to use to :
1. go through the text
2. Find a number pattern (e.g. "01.")
3. Take the sentence following that number pattern
4. Store it in as item in the array

Thanks.

User avatar
mikeyww
Posts: 27107
Joined: 09 Sep 2014, 18:38

Re: Extract text after a pattern in a text string

Post by mikeyww » 16 May 2022, 09:23

The current script does not do it? If so, you can provide the specific input and output strings that you need, as comprehensive examples.

xxx59712
Posts: 4
Joined: 16 May 2022, 06:35

Re: Extract text after a pattern in a text string

Post by xxx59712 » 16 May 2022, 09:34

My apologies, I did not notice there was more lines than visible in your post.

I agree it does that.

The last challenge I am facing is the last line of the example :

Code: Select all

01. Finibus Bonorum et Malorum 02. essentially unchanged it was popularised in 03. The first line of Lorem Ipsum 04. predefined chunks as necessary
Sometimes the text provided does not have some line break to separate 01. from 02. from 03. etc, the text would be a block without any line break.

I would need to extract data the same way as the code you wrote, is that possible

Final result would be :

Code: Select all

[1]since the 1500s, when an
[2]opposed to using 'Content here
[3]and a search for 'lorem ipsum'
[4]randomised words which don't look even slightly believable
[5]going through the cites of the word
[6]Finibus Bonorum et Malorum
[7]essentially unchanged it was popularised in
[8]The first line of Lorem Ipsum
[9]predefined chunks as necessary
Thanks.

User avatar
mikeyww
Posts: 27107
Joined: 09 Sep 2014, 18:38

Re: Extract text after a pattern in a text string

Post by mikeyww » 16 May 2022, 10:15

Use:

Code: Select all

For each, line in StrSplit(RegExReplace(str, " (\d\d(\.| -) )", "`n$1"), "`n", "`r")

xxx59712
Posts: 4
Joined: 16 May 2022, 06:35

Re: Extract text after a pattern in a text string

Post by xxx59712 » 16 May 2022, 10:31

Thank you so much.

I admit this is well above what I can do in term of scripting.

User avatar
mikeyww
Posts: 27107
Joined: 09 Sep 2014, 18:38

Re: Extract text after a pattern in a text string

Post by mikeyww » 16 May 2022, 10:45

Not by much; my regex is similar to yours. My idea was to "normalize" the text by prepending line feeds to those strings. That puts each desired part on its own line. The subsequent line then removes the leading text, and adds the remaining text to an array. There might be simpler approaches overall.

Post Reply

Return to “Ask for Help (v1)”