AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

regex {x,Y} bugs out over 1986: {1,1987} = seen as illegal

 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Bug Reports
View previous topic :: View next topic  
Author Message
Joy2DWorld



Joined: 04 Dec 2006
Posts: 411
Location: Galil, Israel

PostPosted: Sun Sep 23, 2007 12:17 am    Post subject: regex {x,Y} bugs out over 1986: {1,1987} = seen as illegal Reply with quote

Code:
msgbox %  regexmatch("25-2","^(\s*\d+(-|$)){1,1987}",items)

bugs out,

Code:
msgbox %  regexmatch("25-2","^(\s*\d+(-|$)){1,1986}",items)
works.



sim bug with {3999} etc. although 1987 happens to work if no ,
_________________
Joyce Jamce
Back to top
View user's profile Send private message
Joy2DWorld



Joined: 04 Dec 2006
Posts: 411
Location: Galil, Israel

PostPosted: Sun Sep 23, 2007 12:25 am    Post subject: Reply with quote

bugs out other ways as well,

ex.

Code:
msgbox %   regexmatch("25-2","^((\s*\d+(-|$)){1,1000})((\s*\d+(-|$)){1,1000})",items)

_________________
Joyce Jamce
Back to top
View user's profile Send private message
Titan



Joined: 11 Aug 2004
Posts: 5009
Location: imaginationland

PostPosted: Sun Sep 23, 2007 1:28 am    Post subject: Reply with quote

If you think you found a bug in pcre shouldn't you report it to the developer instead?
_________________

RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2")
Back to top
View user's profile Send private message Visit poster's website
Joy2DWorld



Joined: 04 Dec 2006
Posts: 411
Location: Galil, Israel

PostPosted: Sun Sep 23, 2007 3:02 am    Post subject: Reply with quote

when run it directly in the current PCRE seems to work fine...

(and did not see it as a bug in the PCRE changelog.)


but maybe i've missed something (?)


it sure looks like a memory allocation issue...

ie.
Code:
msgbox %  regexmatch("25-2","^(\s*\d+(-|$)){1,1987}",items)  "`n"  ErrorLevel





variables in AHK can contain MEGS, even 100s of megs...

hopefully should be able to work with them using regex....
_________________
Joyce Jamce
Back to top
View user's profile Send private message
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10465

PostPosted: Sun Nov 11, 2007 11:45 pm    Post subject: Reply with quote

Via ErrorLevel, I see that the PCRE error is "Compile error 20 at offset 47: regular expression too large". When AutoHotkey is upgraded to the latest PCRE version, I'll see if this problem has been resolved.

Thanks.
Back to top
View user's profile Send private message Send e-mail
Joy2DWorld



Joined: 04 Dec 2006
Posts: 411
Location: Galil, Israel

PostPosted: Mon Nov 12, 2007 3:11 am    Post subject: Reply with quote

likely you've seen this also:

http://www.autohotkey.com/forum/viewtopic.php?t=23481
_________________
Joyce Jamce
Back to top
View user's profile Send private message
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10465

PostPosted: Wed Nov 21, 2007 1:36 am    Post subject: Reply with quote

As of the latest version of PCRE (7.4), it seems this issue is not yet fixed. If you feel strongly that something should be done, it's probably best to report it as a bug (hopefully there are procedures at www.pcre.org).
Back to top
View user's profile Send private message Send e-mail
Joy2DWorld



Joined: 04 Dec 2006
Posts: 411
Location: Galil, Israel

PostPosted: Wed Nov 21, 2007 2:13 am    Post subject: Reply with quote

run directly with pcre seems all works fine.

maybe this is way off, but maybe,


could it be related to the way pcre is being called & memory allocation going in ?


it's been a while since I looked at the pcre manpage, but seems like there were some mem use options...
_________________
Joyce Jamce
Back to top
View user's profile Send private message
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10465

PostPosted: Wed Nov 21, 2007 3:23 am    Post subject: Reply with quote

There are some settings that can increase the amount of memory available for compiling a pattern. The most likely one in this case is:
Quote:
/* The value of LINK_SIZE determines the number of bytes used to store links
as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases.
However, PCRE can also be compiled to use 3 or 4 bytes instead. This allows
for longer patterns in extreme cases. On systems that support it,
"configure" can be used to override this default. */
#ifndef LINK_SIZE
#define LINK_SIZE 2
#endif
AutoHotkey uses the default of 2.

You could try recompiling with a higher limit to see if it helps.
Back to top
View user's profile Send private message Send e-mail
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10465

PostPosted: Wed Nov 21, 2007 11:05 am    Post subject: Reply with quote

Yes, increasing LINK_SIZE to 3 or 4 solves it. However, that also increases the memory used by all regular expressions in the cache. Because of this -- and because languages like PHP seem to stay with the default of 2 -- it seems best to keep the default of 2.

Perhaps there is a way to redesign regular expressions like these so that the compiled pattern doesn't become so large. For example, putting a + in place of {1,1987} seems to fix the topmost one.
Back to top
View user's profile Send private message Send e-mail
Joy2DWorld



Joined: 04 Dec 2006
Posts: 411
Location: Galil, Israel

PostPosted: Wed Nov 21, 2007 1:08 pm    Post subject: Reply with quote

1. basic regex why these are fundamental structures,
Quote:
putting a + in place of {1,1987} seems to fix the topmost one.


+ obviously does not count, {x,y} is a flexible count, allowing a fixed number of matches.


ie. + is NOT a workaround for {x,y}.


2. If am remembering correctly there are other pcre memory use options and alternatives.


3. 64k is tiny, What exactly is the memory cost of a compile with 4 byte offset links ? [also am not certain about timing issues, if there is a processing cost ??].


4. At least to my view, the REGEX is a *huge* engine of what AHK can do. It gives you DEEP PROCESSING CAPABILITIES, which combined with the easy 'hotkey' access, opens for a huge range of use.

One of the (to my view) *very* nice (or amazing) things about AHK is that a non-programmer beginner can use it (simple file with shortcuts), and, much more advanced level of coding is also possible.

Keeping the capabilities "DEEP", expands the depth of range of use and attractivity.. which keeps it fun for users at all levels.



At least to my view,


Getting the REGEX fully powered is critical.



but, hey,


what do I know....



ps: unilateral 'fixes' are not really helpful for those who want to SHARE. Ie. it's critical that my code run perfect on everyone else's AHK... or it's not something that can be shared.


This is why sure hope this is an issue that can be fixed...
_________________
Joyce Jamce
Back to top
View user's profile Send private message
Joy2DWorld



Joined: 04 Dec 2006
Posts: 411
Location: Galil, Israel

PostPosted: Wed Nov 21, 2007 8:51 pm    Post subject: Reply with quote

from my own prelim testing,
Code:
--with-link-size=3
does not have noticable impact on memory actually used.


[edit
ie.
Quote:
increases the memory used by all regular expressions in the cache.
but only by one byte per LINK, so on most massive allowed pattern (currently) would be only about 1k overhead...
[/edit

seems to be tiny speed degredation, maybe less then tiny on other patterns ?? (ie. seems issue of concern worth testing).



ie. seems (unless have missed something in my testing, which is *very* possible) the only issue is one of processor usage/speed.


Certainly defer to your testing/experience with the issue...


What do you think ?


edit: looked at the pcre man pages, and notices (a) limit on recursion option [maybe relevant] and (b) langague that increasing link pointer width will impact speed....

but with *much* testing, [and maybe am doing something wrong in approach to test], am not finding much speed cost.


if it can be warmly received,

my humble suggestion:

Perhaps try some benchmarks with the 3 byte (or even 4 byte) link pointer...


(ps: curious what size pointer perl is using, as also testing in perl shows no problem with the expressions
_________________
Joyce Jamce
Back to top
View user's profile Send private message
Joy2DWorld



Joined: 04 Dec 2006
Posts: 411
Location: Galil, Israel

PostPosted: Wed Nov 21, 2007 9:17 pm    Post subject: Reply with quote

gosh, also, not sure it's the link pointer. even something simple like this:
Code:
msgbox 32,, %  regexmatch("25-2","(\d,){5987}",items) "`n" errorlevel

_________________
Joyce Jamce
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Bug Reports All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group