Parse based on a word? Topic is solved
Parse based on a word?
It looks like Loop, Parse, ... requires a single character for the delimiter. Does anyone know if a straight-forward way exsists to look for a particular word in a text string, and parse the string based on that?
ste(phen|ve) kunkel
Re: Parse based on a word? Topic is solved
strSplit may help with that.
The delimiter will not be part of the output, you could re-add it if needed.
Code: Select all
myString := "a long string that I want to cut up by the word string."
myArray := strSplit(myString, "string")
; => ["a long ", " that I want to cut up by the word ", "."]
Re: Parse based on a word?
Thanks Chunjee!
I set it up like this: and it seems to work well.
I set it up like this:
Code: Select all
myString := "a long string that I want to cut up by the word string."
myArray := strSplit(myString, "string")
For index, segment in myArray
MsgBox % segment " string"
ste(phen|ve) kunkel
Re: Parse based on a word?
also an example with different delimiters , add then `r`n
Code: Select all
;--
var := "Hello One - Hello Two_Hello Three (Hello Four);Hello FIVE; Edit Sales Order - Stop Slow Traffic - SSLO620-81 - Google Chrome"
Arr := StrSplit(var,[" - ","_","(",")",";"])
tot := arr.maxindex()
loop, %tot%
{
xx:=Arr[a_index]
if (xx="")
continue
xx=%xx%
e .= xx . "`r`n"
}
msgbox,%e%
ExitApp
Re: Parse based on a word?
Thanks Garry! You sortof read my mind. I was already needing to have multiple possible deliminators... The sample text I'm working with is actually this:
My thing that I'm doing is using a series of regexReplaces to correct the grammar in this text. Mostly I need to remove the "s" from the verbs. My regexes are "over fixing" the S's though... I plan to make them un-greedy, but then I have to break up the phrases so that all occurrences are fixed.
I'd like my deliminator to be the word "they". Many--but not all--of them are capitalized (and strSplit appears to be case-sensitive). I also need to keep the deliminator with the parsed-up segments, which makes it a bit tricker if the deliminators are different.MyEntry =
(
They really likes sleep.
They often likes sleep.
They likes sleep.
They is an active student.
They was right afterall.
They has many friends and they was student president last year.
They likes cookies. Then they waltzes across the room.
They often surpasses all expectations.
They usually washes thier hands after using the restroom.
They watches the teacher.
They sits up front and watches the teacher.
They reads well. They fixes the mistakes on their many papers.
They goes to the resource room.
They studies hard and they plays in the band.
)
My thing that I'm doing is using a series of regexReplaces to correct the grammar in this text. Mostly I need to remove the "s" from the verbs. My regexes are "over fixing" the S's though... I plan to make them un-greedy, but then I have to break up the phrases so that all occurrences are fixed.
ste(phen|ve) kunkel
Re: Parse based on a word?
Hey can you guys tell me if this is close to working code? I tried to combine both of your suggestions. It's the part in the braces that is throwing an error "variable contains illegal character."
Code: Select all
MyEntry := "They likes this and they dislikes that. They really likes sleep. They often likes sleep. They likes sleep. They is an active student. They was right afterall. They has many friends and they was student president last year. They likes cookies. Then they waltzes across the room. They often surpasses all expectations. They usually washes thier hands after using the restroom. They watches the teacher. They sits up front and watches the teacher. They reads well. They fixes the mistakes on their many papers. They goes to the resource room. They studies hard and they plays in the band."
myArray := strSplit(myEntry, ["They","they"])
For index, segment in myArray
{
; ... regexMatches will go here.
segment .= %segment% "they"
}
MsgBox % segment
ste(phen|ve) kunkel
Re: Parse based on a word?
Actually... I think using this combined = %combined% %segment% inside the loop has fixed it.
ste(phen|ve) kunkel
Re: Parse based on a word?
I do not know what the desired output is.
Re: Parse based on a word?
Happen to be reading this, and I also don't understand the overall logic of what is trying to be accomplished. It appears you want to remove the "s" from verbs, but then why would you need to split a string by "they"? It appears you may want to put each sentence into an array, but the way it's being done seems strange, because not every sentence starts with "they".kunkel321 wrote: ↑26 Oct 2021, 16:26I'd like my deliminator to be the word "they". Many--but not all--of them are capitalized (and strSplit appears to be case-sensitive). I also need to keep the deliminator with the parsed-up segments, which makes it a bit tricker if the deliminators are different.
My thing that I'm doing is using a series of regexReplaces to correct the grammar in this text. Mostly I need to remove the "s" from the verbs. My regexes are "over fixing" the S's though... I plan to make them un-greedy, but then I have to break up the phrases so that all occurrences are fixed.
To help with the confusion, seems like a better explanation of the goal is needed.
Re: Parse based on a word?
don't really know, a test script, doesn't remove 's' from verbs ...
Code: Select all
#warn
MyEntry := "aaa_They likes this and >2they dislikes that. >3They really likes sleep. >4They often likes sleep. >5They likes sleep. >6They is an active student. >7They was right afterall. >8They has many friends and >9they was student president last year. >10They likes cookies. >11Then they waltzes across the room. >12They often surpasses all expectations. >13They usually washes thier hands after using the restroom. >14They watches the teacher. >15They sits up front and watches the teacher. >16They reads well. >17They fixes the mistakes on their many papers. >18They goes to the resource room. >19They studies hard and >20they plays in the band."
myArray := strSplit(myEntry, ["They","they"])
i:=0
e:=""
For index, segment in myArray
{
if (a_index=1)
continue
i++
e .= i . "-They" . segment . "`r`n"
}
MsgBox, %e%
return
/*
1-They likes this and >2
2-They dislikes that. >3
3-They really likes sleep. >4
4-They often likes sleep. >5
5-They likes sleep. >6
6-They is an active student. >7
7-They was right afterall. >8
8-They has many friends and >9
9-They was student president last year. >10
10-They likes cookies. >11Then
11-They waltzes across the room. >12
12-They often surpasses all expectations. >13
13-They usually washes thier hands after using the restroom. >14
14-They watches the teacher. >15
15-They sits up front and watches the teacher. >16
16-They reads well. >17
17-They fixes the mistakes on their many papers. >18
18-They goes to the resource room. >19
19-They studies hard and >20
20-They plays in the band.
*/
Re: Parse based on a word?
Thanks for the input guys. Sorry about the vague description. It's actually a newer version of this
viewtopic.php?f=6&t=87791&p=386615#p386615
Text-Expansion tool. The gender pronouns (he/she/they) get put into the pre-written boilerplate text at runtime. You'll notice with my previous examples that the grammar is all correct if the gender pronoun is 'he' or 'she'.
For the rare times that the chosen gender is "neutral," the above grammar-fixing code gets used. The most-frequently-needed fixes are these:
And *every* occurrence should be fixed.
But there are other possible occurrences. For example:
EDIT:
I should point out that part of the problem is that I'm using multiple regexs for the different verbs (ies vs. es vs. s). The ones I currently have are:
(forum members helped create these by the way)
Having three in a row is also problematic because each substring/segment can potentially get changed three times. I noticed that, when there is a modifier for the verb (like he rarely tries), that modifier word often ends with "ly" (though it didn't in the preceding sentence--LOL). So I integrated that into the regexes. Anyway, I suppose I'm off-topic for this forum thread...
viewtopic.php?f=6&t=87791&p=386615#p386615
Text-Expansion tool. The gender pronouns (he/she/they) get put into the pre-written boilerplate text at runtime. You'll notice with my previous examples that the grammar is all correct if the gender pronoun is 'he' or 'she'.
For the rare times that the chosen gender is "neutral," the above grammar-fixing code gets used. The most-frequently-needed fixes are these:
Code: Select all
segment := StrReplace(segment, "they is", "they are")
segment := StrReplace(segment, "they has", "they have")
segment := StrReplace(segment, "they was", "they were")
But there are other possible occurrences. For example:
These are still fairly easy, because the to-be-fixed verb directly follows "they." The regex replacements have to be more "tolerant" though, because there could be adverbs:he runs --> they run (simple 's')
he tries --> they try (ies -->y)
he passes --> they pass (not just simple 's')
This is still doable, but if there are multiple words ending in "s" you get errors:He rarely tries --> they rarely try
That was why I thought about segmenting each occurrence of "they string string string." I can use a greedy (ungreedy?) regex to only change the first occurrence of an 's' word. Still it won't be fool-proof... if only the first occurrence is changed then I'll gethe washes his hands --> they wash their hand
he cleans the mess --> they clean the mes
English is so obtuse that I'll probably never be able to fix every possible error. Still it's fun to try ---he runs and jumps --> they run and jumps
EDIT:
I should point out that part of the problem is that I'm using multiple regexs for the different verbs (ies vs. es vs. s). The ones I currently have are:
Code: Select all
segment := RegExReplace(segment, "s)([Tt]hey\s)(.*?[^\s])ies\b", "$1$2y")
segment := RegExReplace(segment, "s)([Tt]hey\s)(\w*ly\s?)((s|z|sh|ch|x|o))es\b", "$1$2$3")
segment := RegExReplace(segment, "s)([Tt]hey\s)(\w*ly\s)?(.*)s\b", "$1$2$3")
Having three in a row is also problematic because each substring/segment can potentially get changed three times. I noticed that, when there is a modifier for the verb (like he rarely tries), that modifier word often ends with "ly" (though it didn't in the preceding sentence--LOL). So I integrated that into the regexes. Anyway, I suppose I'm off-topic for this forum thread...
ste(phen|ve) kunkel