Parse based on a word?

kunkel321 · Post by **kunkel321** » 26 Oct 2021, 13:57

It looks like Loop, Parse, ... requires a single character for the delimiter. Does anyone know if a straight-forward way exsists to look for a particular word in a text string, and parse the string based on that?

Chunjee · Post by **Chunjee** » 26 Oct 2021, 14:03

strSplit may help with that.

Code: Select all

myString := "a long string that I want to cut up by the word string."
myArray := strSplit(myString, "string")
; => ["a long ", " that I want to cut up by the word ", "."]

The delimiter will not be part of the output, you could re-add it if needed.

kunkel321 · Post by **kunkel321** » 26 Oct 2021, 15:27

Thanks Chunjee!

I set it up like this:

Code: Select all

myString := "a long string that I want to cut up by the word string."
myArray := strSplit(myString, "string")
For index, segment in myArray
    MsgBox % segment " string"

and it seems to work well.

garry · Post by **garry** » 26 Oct 2021, 15:41

also an example with different delimiters , add then `r`n

Code: Select all

;--
var := "Hello One - Hello Two_Hello Three (Hello Four);Hello FIVE; Edit Sales Order - Stop Slow Traffic - SSLO620-81 - Google Chrome"
Arr := StrSplit(var,[" - ","_","(",")",";"])
tot := arr.maxindex()
loop, %tot%
  {
  xx:=Arr[a_index]
  if (xx="")
    continue
  xx=%xx%
  e .= xx . "`r`n"
  }
msgbox,%e%
ExitApp

kunkel321 · Post by **kunkel321** » 26 Oct 2021, 16:26

Thanks Garry! You sortof read my mind. I was already needing to have multiple possible deliminators... The sample text I'm working with is actually this:

MyEntry =
(
They really likes sleep.
They often likes sleep.
They likes sleep.
They is an active student.
They was right afterall.
They has many friends and they was student president last year.
They likes cookies. Then they waltzes across the room.
They often surpasses all expectations.
They usually washes thier hands after using the restroom.
They watches the teacher.
They sits up front and watches the teacher.
They reads well. They fixes the mistakes on their many papers.
They goes to the resource room.
They studies hard and they plays in the band.
)

I'd like my deliminator to be the word "they". Many--but not all--of them are capitalized (and strSplit appears to be case-sensitive). I also need to keep the deliminator with the parsed-up segments, which makes it a bit tricker if the deliminators are different.

My thing that I'm doing is using a series of regexReplaces to correct the grammar in this text. Mostly I need to remove the "s" from the verbs. My regexes are "over fixing" the S's though... I plan to make them un-greedy, but then I have to break up the phrases so that all occurrences are fixed.

kunkel321 · Post by **kunkel321** » 26 Oct 2021, 16:53

Hey can you guys tell me if this is close to working code? I tried to combine both of your suggestions. It's the part in the braces that is throwing an error "variable contains illegal character."

Code: Select all

MyEntry := "They likes this and they dislikes that.  They really likes sleep.  They often likes sleep.  They likes sleep.  They is an active student. They was right afterall. They has many friends and they was student president last year.  They likes cookies.  Then they waltzes across the room. They often surpasses all expectations.  They usually washes thier hands after using the restroom.  They watches the teacher.  They sits up front and watches the teacher.  They reads well. They fixes the mistakes on their many papers.  They goes to the resource room.  They studies hard and they plays in the band."

myArray := strSplit(myEntry, ["They","they"])
For index, segment in myArray
{
	; ... regexMatches will go here.
	segment .= %segment% "they"
}
MsgBox % segment

kunkel321 · Post by **kunkel321** » 26 Oct 2021, 17:07

Actually... I think using this combined = %combined% %segment% inside the loop has fixed it.

Chunjee · Post by **Chunjee** » 26 Oct 2021, 17:58

I do not know what the desired output is.

SOTE · Post by **SOTE** » 26 Oct 2021, 19:26

kunkel321 wrote: ↑
26 Oct 2021, 16:26
I'd like my deliminator to be the word "they". Many--but not all--of them are capitalized (and strSplit appears to be case-sensitive). I also need to keep the deliminator with the parsed-up segments, which makes it a bit tricker if the deliminators are different.

My thing that I'm doing is using a series of regexReplaces to correct the grammar in this text. Mostly I need to remove the "s" from the verbs. My regexes are "over fixing" the S's though... I plan to make them un-greedy, but then I have to break up the phrases so that all occurrences are fixed.

Happen to be reading this, and I also don't understand the overall logic of what is trying to be accomplished. It appears you want to remove the "s" from verbs, but then why would you need to split a string by "they"? It appears you may want to put each sentence into an array, but the way it's being done seems strange, because not every sentence starts with "they".

To help with the confusion, seems like a better explanation of the goal is needed.

garry · Post by **garry** » 27 Oct 2021, 01:14

don't really know, a test script, doesn't remove 's' from verbs ...

Code: Select all

#warn
MyEntry := "aaa_They likes this and >2they dislikes that.  >3They really likes sleep.  >4They often likes sleep.  >5They likes sleep.  >6They is an active student. >7They was right afterall. >8They has many friends and >9they was student president last year.  >10They likes cookies.  >11Then they waltzes across the room. >12They often surpasses all expectations.  >13They usually washes thier hands after using the restroom.  >14They watches the teacher.  >15They sits up front and watches the teacher.  >16They reads well. >17They fixes the mistakes on their many papers.  >18They goes to the resource room.  >19They studies hard and >20they plays in the band."
myArray := strSplit(myEntry, ["They","they"])
i:=0
e:=""
For index, segment in myArray
{
if (a_index=1)
   continue
i++
e .= i . "-They" . segment . "`r`n"
}
MsgBox, %e%
return

/*
1-They likes this and >2
2-They dislikes that.  >3
3-They really likes sleep.  >4
4-They often likes sleep.  >5
5-They likes sleep.  >6
6-They is an active student. >7
7-They was right afterall. >8
8-They has many friends and >9
9-They was student president last year.  >10
10-They likes cookies.  >11Then 
11-They waltzes across the room. >12
12-They often surpasses all expectations.  >13
13-They usually washes thier hands after using the restroom.  >14
14-They watches the teacher.  >15
15-They sits up front and watches the teacher.  >16
16-They reads well. >17
17-They fixes the mistakes on their many papers.  >18
18-They goes to the resource room.  >19
19-They studies hard and >20
20-They plays in the band.
*/

kunkel321 · Post by **kunkel321** » 27 Oct 2021, 11:24

Thanks for the input guys. Sorry about the vague description. It's actually a newer version of this
viewtopic.php?f=6&t=87791&p=386615#p386615
Text-Expansion tool. The gender pronouns (he/she/they) get put into the pre-written boilerplate text at runtime. You'll notice with my previous examples that the grammar is all correct if the gender pronoun is 'he' or 'she'.

For the rare times that the chosen gender is "neutral," the above grammar-fixing code gets used. The most-frequently-needed fixes are these:

Code: Select all

segment := StrReplace(segment, "they is", "they are") 	
segment := StrReplace(segment, "they has", "they have")	
segment := StrReplace(segment, "they was", "they were")

And *every* occurrence should be fixed.

But there are other possible occurrences. For example:

he runs --> they run (simple 's')
he tries --> they try (ies -->y)
he passes --> they pass (not just simple 's')

These are still fairly easy, because the to-be-fixed verb directly follows "they." The regex replacements have to be more "tolerant" though, because there could be adverbs:

He rarely tries --> they rarely try

This is still doable, but if there are multiple words ending in "s" you get errors:

he washes his hands --> they wash their hand
he cleans the mess --> they clean the mes

That was why I thought about segmenting each occurrence of "they string string string." I can use a greedy (ungreedy?) regex to only change the first occurrence of an 's' word. Still it won't be fool-proof... if only the first occurrence is changed then I'll get

he runs and jumps --> they run and jumps

English is so obtuse that I'll probably never be able to fix every possible error. Still it's fun to try ---

EDIT:
I should point out that part of the problem is that I'm using multiple regexs for the different verbs (ies vs. es vs. s). The ones I currently have are:

Code: Select all

segment := RegExReplace(segment, "s)([Tt]hey\s)(.*?[^\s])ies\b", "$1$2y") 
segment := RegExReplace(segment, "s)([Tt]hey\s)(\w*ly\s?)((s|z|sh|ch|x|o))es\b", "$1$2$3") 
segment := RegExReplace(segment, "s)([Tt]hey\s)(\w*ly\s)?(.*)s\b", "$1$2$3")

(forum members helped create these by the way)

Having three in a row is also problematic because each substring/segment can potentially get changed three times. I noticed that, when there is a modifier for the verb (like he rarely tries), that modifier word often ends with "ly" (though it didn't in the preceding sentence--LOL). So I integrated that into the regexes. Anyway, I suppose I'm off-topic for this forum thread...

AutoHotkey Community

Parse based on a word? Topic is solved

Parse based on a word?

Re: Parse based on a word? Topic is solved

Re: Parse based on a word?

Re: Parse based on a word?

Re: Parse based on a word?

Re: Parse based on a word?

Re: Parse based on a word?

Re: Parse based on a word?

Re: Parse based on a word?

Re: Parse based on a word?

Re: Parse based on a word?