Page 1 of 1

Regex Stop if encountered

Posted: 05 Dec 2022, 03:34
by Rikk03

For this regex code I want to add a stop point.

Code: Select all

If specific text is encountered / or a specific tag. It will only work before this point; everything after won't be matched.

If I can do this, then it would be ok for me to use.

Another question, how could I define a start point such as the very first h1

Code: Select all


Re: Regex Stop if encountered

Posted: 05 Dec 2022, 05:50
by Rohwedder

Code: Select all

RegExMatch(Text, "NeedleRegEx" , M)
MsgBox,% M1
Just give a list of inputs-Text and the desired outputs-M1.

Re: Regex Stop if encountered

Posted: 05 Dec 2022, 12:20
by adrianh
Not sure if I fully understand. Maybe if you state what you are asking for rather than posting a regex, it might help. However, from what I'm getting from your current question, sounds like you want to go through a bunch of XML style tags, which have no whitespace between them, and match a bunch till you get to another tag. Is that right? If so, the regex you want is something like this:

Code: Select all

  1. I replaced your [^>]*? with [^>]*+ because you don't need to check to see if the next char is > at every character. You already have the [^>] character class stating it'll take only characters that are not >.
  2. The extra + after the * or + means don't backtrack if a failure occurs. This increases the match speed if you know for a fact that backtracking won't help in finding a match.
  3. [^<]++ matches as many non-< as possible.
  4. If matches a <, then it must not be followed by stop_tag.
  5. *+ matches 0 or more without backtracking.
To have it stop at a particular string would be slower, esp since you didn't give any context as to where this string can occur. However, slower is relative. This'll prolly be plenty fast. So, something like:

Code: Select all

"<(h1|h2|p)[^>]*+>(?:(?!stop string|<stop_tag>).)*+"
Which basically reads as, if the text from this point on is not stop string or <stop_tag> then match the next character and do that 0 or more times without backtracking. The "slowness" is due to the string comparisons on each and every character, but as I said, prolly still pretty fast. If strings being compared are small enough to fit into a CPU cache line, still really fast.

Not exactly sure what you were doing with the \K[^<]*. You were resetting the match, but I'm not exactly sure why. What was your reason for this?

Let me know if that's what you want and if it helps.


Re: Regex Stop if encountered

Posted: 06 Dec 2022, 02:01
by Rikk03
I fully understand your confusion. I am after the text between the tags h1 h2 and p. As an example, any HTML page uses those tags. I've since improved on it.

Code: Select all

Thanks for your assistance AdrianH; your contribution, while yours does not match the stop string/tag, still returns matches after the stop string/tag The idea was to define a clear stop point: NO matches after it if encountered. I was thinking a negative lookbehind might work.

(?<!whatever) but I can't get it to work

Re: Regex Stop if encountered

Posted: 12 Jan 2023, 14:46
by adrianh
Sorry, I've not been on the board for a while.

If you are still having difficulty with this, could you post a sample string that you want to parse and where you would like to stop?