Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Converting regex recursion pattern from Perl to PCRE


  • Please log in to reply
4 replies to this topic
haichen
  • Members
  • 200 posts
  • Last active: Oct 20 2013 01:14 PM
  • Joined: 05 Feb 2007
Accidentally I found a perl script with this regular expression:
use strict;
use warnings;
my $textInner =  '(outer(inner(most "this (shouldn\'t match)" inner)))';
my $innerRe;
my $idx=0;
my(@match);

$innerRe = qr/
                \(
                (
                   (?:
                      [^()"]+
                   |
                      "[^"]*"
                   |
                      (??{$innerRe})
                   )*
                )
                \)(?{$match[$idx++]=$1;})
             /sx;

$textInner =~ /^$innerRe/g;
print "$match[0]\n";

The regular expression has a recursive Pattern and searches for nested brackets but text with quotation marks is excluded.

So this script finds in (outer(inner(most "this (shouldn\'t match)" inner))):

most "this (shouldn't match)" inner
Now I was interested in the recursiv pattern and i searched for that.
I found this websites with some explantion for recursive patterns with PCRE:
http://regexkit.sour...yntax.html#TOC1
http://regexkit.sour...tern.html#SEC21

Ofcourse I want to build this Regexp with AHK.

This is my attempt for now:

str=(outer(inner(most "this (shouldn\'t match)"(hallo) inner)))
reg=x)(\(  (  (?:[^()"]+   |  "[^"]*"  | [color=red](?R)[/color]  *) ) \)) 
r := RegExMatch(str, reg, s)
msgbox,Pos=%r%`ns=%s%`ns1=%s1%`ns2=%s2%`ns3=%s3%
But this don't stop at the quotationmarks.
it findsshouldn\'t match
Maybe somebody has an idea to make this regexp work.
I try this to learn how to work with recursion, so please show me solutions with recursiv regexp
item (?R).

Hopefully someone can put a light at this. :D
Thanks.

haichen
  • Members
  • 200 posts
  • Last active: Oct 20 2013 01:14 PM
  • Joined: 05 Feb 2007
I try it again. Has somebody an idea for the above question?
Thanks,
haichen

polyethene
  • Administrators
  • 5511 posts
  • Last active: Yesterday, 11:38 PM
  • Joined: 26 Oct 2012
Sorry for the long reply. DerRaphael came to me with this problem the day you posted it so I was giving him a chance to reply to you. I told him to look at the PCRE man page section titled RECURSIVE PATTERNS.

In short I don't think PCRE can handle nested recursive groups. I ran into the same problem with json which meant I had to use an ugly Loop... "^(?:\s*((\[(?:[^[\]]++|(?-1))*\])|(\{(?:[^{\}]++|(?-1))*\})|[^,]*?)\s*(?:,|$)){n}"

In your particular case you can use a lookahead, though I know this is undesirable for most cases:

s = (outer(inner(most "this (shouldn\'t match)"(hallo) inner)))
r = (\(((?:[^()]++|(?-2)))\)(?!"))
RegExMatch(s, r, v)
MsgBox, %v%


haichen
  • Members
  • 200 posts
  • Last active: Oct 20 2013 01:14 PM
  • Joined: 05 Feb 2007
Thanks for your kind reply. Now I will explore the mystery of recursive Pattern more deeply. Hopefully.
:D

Edit:
oups, just saw my own attempt had a wrong example (one of my endless tests..)

the right example is that out of the perlscript

(outer(inner(most "this (shouldn\'t match)" inner)))
and the result should be

most "this (shouldn\'t match)" inner

sorry
haichen

but nevertheless I'll have a look at your RegExp.

polyethene
  • Administrators
  • 5511 posts
  • Last active: Yesterday, 11:38 PM
  • Joined: 26 Oct 2012
I had thought about alternations with full use of (?R):

s = (outer(inner(most "this (shouldn\'t match)" inner)))
r = \(((?:[^()]++|(?R)))\)|"((?:[^"]++|(?R)))"
RegExMatch(s, r, v)
MsgBox, %v%
This gives: "this (shouldn\'t match)" i.e. " is treated as a boundary like ( )

I then tried to make the recursion bounds from my previous regex conditional (("|\()((?:[^(?1)")]++|(?-2)))(?:"|\))) but for obvious reasons this doesn't work well. If we had callbacks it would be much easier.