On 11/12/06, Ross Bamford <rosco / roscopeco.remove.co.uk> wrote:
> On Sun, 12 Nov 2006 15:01:56 -0000, Peter Schrammel
> <peter.schrammel / gmx.de> wrote:
> > Why is there a limitation at all? I implemented the same thing in perl
> > and it no complains ...
> > Is the regexp engine of perl that much better?
> >
>
> Irrespective of whether regex the best solution for your needs, it seems
> Oniguruma will improve the situation somewhat with respect to large
> regular expressions.

I built a local version of 1.8.5 with the oniguruma engine:
    http://raa.ruby-lang.org/project/oniguruma/

And re-ran (a slight variation of) my test program:

[~]$ ruby foo
Using the <undefined> regex engine.
Converted a list of 1 words into a regex 8 bytes long.
Converted a list of 2 words into a regex 36 bytes long.
Converted a list of 4 words into a regex 48 bytes long.
Converted a list of 8 words into a regex 73 bytes long.
Converted a list of 16 words into a regex 173 bytes long.
Converted a list of 32 words into a regex 352 bytes long.
Converted a list of 64 words into a regex 718 bytes long.
Converted a list of 128 words into a regex 1415 bytes long.
Converted a list of 256 words into a regex 2656 bytes long.
Converted a list of 512 words into a regex 5210 bytes long.
Converted a list of 1024 words into a regex 10105 bytes long.
Converted a list of 2048 words into a regex 19432 bytes long.
Converted a list of 4096 words into a regex 37509 bytes long.
@_@

[~]$ /usr/local/bin/ruby foo
Using the Oniguruma regex engine.
Converted a list of 1 words into a regex 11 bytes long.
Converted a list of 2 words into a regex 16 bytes long.
Converted a list of 4 words into a regex 38 bytes long.
Converted a list of 8 words into a regex 97 bytes long.
Converted a list of 16 words into a regex 185 bytes long.
Converted a list of 32 words into a regex 359 bytes long.
Converted a list of 64 words into a regex 686 bytes long.
Converted a list of 128 words into a regex 1387 bytes long.
Converted a list of 256 words into a regex 2715 bytes long.
Converted a list of 512 words into a regex 5264 bytes long.
Converted a list of 1024 words into a regex 10074 bytes long.
Converted a list of 2048 words into a regex 19439 bytes long.
Converted a list of 4096 words into a regex 37452 bytes long.
Converted a list of 8192 words into a regex 71931 bytes long.
Converted a list of 16384 words into a regex 135572 bytes long.
Converted a list of 32768 words into a regex 253027 bytes long.
Converted a list of 65536 words into a regex 461607 bytes long.
Converted a list of 131072 words into a regex 808171 bytes long.
Converted a list of 262144 words into a regex 1326345 bytes long.
Converted a list of 479625 words into a regex 1873539 bytes long.