 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
Joy2DWorld
Joined: 04 Dec 2006 Posts: 411 Location: Galil, Israel
|
Posted: Sun Sep 23, 2007 12:17 am Post subject: regex {x,Y} bugs out over 1986: {1,1987} = seen as illegal |
|
|
| Code: | | msgbox % regexmatch("25-2","^(\s*\d+(-|$)){1,1987}",items) |
bugs out,
| Code: | | msgbox % regexmatch("25-2","^(\s*\d+(-|$)){1,1986}",items) | works.
sim bug with {3999} etc. although 1987 happens to work if no , _________________ Joyce Jamce |
|
| Back to top |
|
 |
Joy2DWorld
Joined: 04 Dec 2006 Posts: 411 Location: Galil, Israel
|
Posted: Sun Sep 23, 2007 12:25 am Post subject: |
|
|
bugs out other ways as well,
ex.
| Code: | | msgbox % regexmatch("25-2","^((\s*\d+(-|$)){1,1000})((\s*\d+(-|$)){1,1000})",items) |
_________________ Joyce Jamce |
|
| Back to top |
|
 |
Titan
Joined: 11 Aug 2004 Posts: 5009 Location: imaginationland
|
Posted: Sun Sep 23, 2007 1:28 am Post subject: |
|
|
If you think you found a bug in pcre shouldn't you report it to the developer instead? _________________
RegExReplace("irc.freenode.net/autohotkey", "^(?=(.(?=[\0-r\[]*((?<=\.).))))(?:[c-\x73]{2,8}(\S))+((2)|\b[^\2-]){2}\D++$", "$u3$1$3$4$2") |
|
| Back to top |
|
 |
Joy2DWorld
Joined: 04 Dec 2006 Posts: 411 Location: Galil, Israel
|
Posted: Sun Sep 23, 2007 3:02 am Post subject: |
|
|
when run it directly in the current PCRE seems to work fine...
(and did not see it as a bug in the PCRE changelog.)
but maybe i've missed something (?)
it sure looks like a memory allocation issue...
ie.
| Code: | | msgbox % regexmatch("25-2","^(\s*\d+(-|$)){1,1987}",items) "`n" ErrorLevel |
variables in AHK can contain MEGS, even 100s of megs...
hopefully should be able to work with them using regex.... _________________ Joyce Jamce |
|
| Back to top |
|
 |
Chris Site Admin
Joined: 02 Mar 2004 Posts: 10465
|
Posted: Sun Nov 11, 2007 11:45 pm Post subject: |
|
|
Via ErrorLevel, I see that the PCRE error is "Compile error 20 at offset 47: regular expression too large". When AutoHotkey is upgraded to the latest PCRE version, I'll see if this problem has been resolved.
Thanks. |
|
| Back to top |
|
 |
Joy2DWorld
Joined: 04 Dec 2006 Posts: 411 Location: Galil, Israel
|
|
| Back to top |
|
 |
Chris Site Admin
Joined: 02 Mar 2004 Posts: 10465
|
Posted: Wed Nov 21, 2007 1:36 am Post subject: |
|
|
| As of the latest version of PCRE (7.4), it seems this issue is not yet fixed. If you feel strongly that something should be done, it's probably best to report it as a bug (hopefully there are procedures at www.pcre.org). |
|
| Back to top |
|
 |
Joy2DWorld
Joined: 04 Dec 2006 Posts: 411 Location: Galil, Israel
|
Posted: Wed Nov 21, 2007 2:13 am Post subject: |
|
|
run directly with pcre seems all works fine.
maybe this is way off, but maybe,
could it be related to the way pcre is being called & memory allocation going in ?
it's been a while since I looked at the pcre manpage, but seems like there were some mem use options... _________________ Joyce Jamce |
|
| Back to top |
|
 |
Chris Site Admin
Joined: 02 Mar 2004 Posts: 10465
|
Posted: Wed Nov 21, 2007 3:23 am Post subject: |
|
|
There are some settings that can increase the amount of memory available for compiling a pattern. The most likely one in this case is:
| Quote: | /* The value of LINK_SIZE determines the number of bytes used to store links
as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases.
However, PCRE can also be compiled to use 3 or 4 bytes instead. This allows
for longer patterns in extreme cases. On systems that support it,
"configure" can be used to override this default. */
#ifndef LINK_SIZE
#define LINK_SIZE 2
#endif | AutoHotkey uses the default of 2.
You could try recompiling with a higher limit to see if it helps. |
|
| Back to top |
|
 |
Chris Site Admin
Joined: 02 Mar 2004 Posts: 10465
|
Posted: Wed Nov 21, 2007 11:05 am Post subject: |
|
|
Yes, increasing LINK_SIZE to 3 or 4 solves it. However, that also increases the memory used by all regular expressions in the cache. Because of this -- and because languages like PHP seem to stay with the default of 2 -- it seems best to keep the default of 2.
Perhaps there is a way to redesign regular expressions like these so that the compiled pattern doesn't become so large. For example, putting a + in place of {1,1987} seems to fix the topmost one. |
|
| Back to top |
|
 |
Joy2DWorld
Joined: 04 Dec 2006 Posts: 411 Location: Galil, Israel
|
Posted: Wed Nov 21, 2007 1:08 pm Post subject: |
|
|
1. basic regex why these are fundamental structures, | Quote: | | putting a + in place of {1,1987} seems to fix the topmost one. |
+ obviously does not count, {x,y} is a flexible count, allowing a fixed number of matches.
ie. + is NOT a workaround for {x,y}.
2. If am remembering correctly there are other pcre memory use options and alternatives.
3. 64k is tiny, What exactly is the memory cost of a compile with 4 byte offset links ? [also am not certain about timing issues, if there is a processing cost ??].
4. At least to my view, the REGEX is a *huge* engine of what AHK can do. It gives you DEEP PROCESSING CAPABILITIES, which combined with the easy 'hotkey' access, opens for a huge range of use.
One of the (to my view) *very* nice (or amazing) things about AHK is that a non-programmer beginner can use it (simple file with shortcuts), and, much more advanced level of coding is also possible.
Keeping the capabilities "DEEP", expands the depth of range of use and attractivity.. which keeps it fun for users at all levels.
At least to my view,
Getting the REGEX fully powered is critical.
but, hey,
what do I know....
ps: unilateral 'fixes' are not really helpful for those who want to SHARE. Ie. it's critical that my code run perfect on everyone else's AHK... or it's not something that can be shared.
This is why sure hope this is an issue that can be fixed... _________________ Joyce Jamce |
|
| Back to top |
|
 |
Joy2DWorld
Joined: 04 Dec 2006 Posts: 411 Location: Galil, Israel
|
Posted: Wed Nov 21, 2007 8:51 pm Post subject: |
|
|
from my own prelim testing, does not have noticable impact on memory actually used.
[edit
ie. | Quote: | | increases the memory used by all regular expressions in the cache. | but only by one byte per LINK, so on most massive allowed pattern (currently) would be only about 1k overhead...
[/edit
seems to be tiny speed degredation, maybe less then tiny on other patterns ?? (ie. seems issue of concern worth testing).
ie. seems (unless have missed something in my testing, which is *very* possible) the only issue is one of processor usage/speed.
Certainly defer to your testing/experience with the issue...
What do you think ?
edit: looked at the pcre man pages, and notices (a) limit on recursion option [maybe relevant] and (b) langague that increasing link pointer width will impact speed....
but with *much* testing, [and maybe am doing something wrong in approach to test], am not finding much speed cost.
if it can be warmly received,
my humble suggestion:
Perhaps try some benchmarks with the 3 byte (or even 4 byte) link pointer...
(ps: curious what size pointer perl is using, as also testing in perl shows no problem with the expressions _________________ Joyce Jamce |
|
| Back to top |
|
 |
Joy2DWorld
Joined: 04 Dec 2006 Posts: 411 Location: Galil, Israel
|
Posted: Wed Nov 21, 2007 9:17 pm Post subject: |
|
|
gosh, also, not sure it's the link pointer. even something simple like this: | Code: | | msgbox 32,, % regexmatch("25-2","(\d,){5987}",items) "`n" errorlevel |
_________________ Joyce Jamce |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|