On 11/17/06, brabuhr / gmail.com <brabuhr / gmail.com> wrote:
> On 11/12/06, Ross Bamford <rosco / roscopeco.remove.co.uk> wrote:
> > On Sun, 12 Nov 2006 15:01:56 -0000, Peter Schrammel
> > <peter.schrammel / gmx.de> wrote:
> > > Why is there a limitation at all? I implemented the same thing in perl
> > > and it no complains ...
> > > Is the regexp engine of perl that much better?
> > >
> >
> > Irrespective of whether regex the best solution for your needs, it seems
> > Oniguruma will improve the situation somewhat with respect to large
> > regular expressions.
>
> I built a local version of 1.8.5 with the oniguruma engine:
>     http://raa.ruby-lang.org/project/oniguruma/
>
> And re-ran (a slight variation of) my test program:

I thought I'd try running under jruby too:

$ ruby long_regex_test.rb
Took 0.000153 seconds to convert 1 words into a regex 17 bytes long.
Took 0.000381 seconds to convert 2 words into a regex 20 bytes long.
Took 0.000393 seconds to convert 4 words into a regex 36 bytes long.
Took 0.000629 seconds to convert 8 words into a regex 93 bytes long.
Took 0.001359 seconds to convert 16 words into a regex 180 bytes long.
Took 0.002261 seconds to convert 32 words into a regex 360 bytes long.
Took 0.007304 seconds to convert 64 words into a regex 741 bytes long.
Took 0.013601 seconds to convert 128 words into a regex 1348 bytes long.
Took 0.028273 seconds to convert 256 words into a regex 2746 bytes long.
Took 0.066228 seconds to convert 512 words into a regex 5345 bytes long.
Took 0.177105 seconds to convert 1024 words into a regex 10017 bytes long.
Took 0.330573 seconds to convert 2048 words into a regex 19597 bytes long.
Took 1.390542 seconds to convert 4096 words into a regex 37345 bytes long.
long_regex_test.rb:26:in `match': regular expression too big:
/(?:A(?:cr(?:edula|opora)|d(?:ar|elochorda|ventis[mt])|frogaean|hepatokla|ileen|l(?:adinist|l(?:a(?:manda|sch)|otheria)|ticamelus)|m(?:bystomidae|ericanly|ioidei|phioxidae)|n(?:chisaurus|d(?:aman|romache)|olympiad|t(?:echinomys|h(?:ophila|ropozoic)))|patornis|r(?:ab|chelenis|istarch)|s(?:caridia|elli|hantee|ilidae|terias)|tropa|u(?:riculidae|stroasiatic))|B(?:a(?:cchus|eria|haism|iera|k(?:shaish|wiri)|re|sili(?:ca|scus))|e(?:atrice|l(?:g(?:ae|ic)|shazzaresque)|mbex|rn(?:inesque|oullian))|i(?:elid|lati|smarck|tis)|lackfoot|o(?:hemia|llandist|rrovian)|ra(?:m|nchiopulmonata)|u(?:nga|phthalmum)|yroni(?:cs|te))|C(?:a(?:ctales|l(?:edonia|li(?:carpa|stephus)|ochortaceae|vados|ycophorae)|m(?:bodian|orra)|ntabri|p(?:ito(?:line)?|sidae)|r(?:olan|tist)|s(?:sandra|tanospermum)|thari)|e(?:ntrarchidae|strian)|h(?:arontas|e(?:lura|makuan)|rist(?:ianomastix|li(?:keness|ness)|mas))|lathrus|o(?:bleskill|fane|l(?:letidae|ossian)|m(?:melinaceae|us)|rybantic)|rocus|u(?:cumariidae|thbert)|y(?:clos
 pondy
(RegexpError)
        from long_regex_test.rb:26
        from long_regex_test.rb:15:in `times'
        from long_regex_test.rb:15

$ /opt/ruby/v1.8.5-oniguruma/bin/ruby long_regex_test.rb
Took 0.000211 seconds to convert 1 words into a regex 5 bytes long.
Took 0.000334 seconds to convert 2 words into a regex 24 bytes long.
Took 0.000215 seconds to convert 4 words into a regex 52 bytes long.
Took 0.000836 seconds to convert 8 words into a regex 92 bytes long.
Took 0.000885 seconds to convert 16 words into a regex 173 bytes long.
Took 0.002779 seconds to convert 32 words into a regex 345 bytes long.
Took 0.004934 seconds to convert 64 words into a regex 725 bytes long.
Took 0.009765 seconds to convert 128 words into a regex 1369 bytes long.
Took 0.020761 seconds to convert 256 words into a regex 2737 bytes long.
Took 0.088759 seconds to convert 512 words into a regex 5408 bytes long.
Took 0.144276 seconds to convert 1024 words into a regex 10131 bytes long.
Took 0.246762 seconds to convert 2048 words into a regex 19531 bytes long.
Took 0.667575 seconds to convert 4096 words into a regex 37498 bytes long.
Took 1.677037 seconds to convert 8192 words into a regex 71352 bytes long.
Took 2.971277 seconds to convert 16384 words into a regex 133499 bytes long.
Took 6.078681 seconds to convert 32768 words into a regex 245318 bytes long.
Took 13.001538 seconds to convert 65536 words into a regex 433611 bytes long.
Took 26.791838 seconds to convert 131072 words into a regex 713229 bytes long.
Took 47.691109 seconds to convert 262144 words into a regex 1061186 bytes long.
Took 71.050324 seconds to convert 524288 words into a regex 1354567 bytes long.

$ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home
$ ~/Desktop/jruby-0.9.1/bin/jruby long_regex_test.rb
Took 0.032 seconds to convert 1 words into a regex 9 bytes long.
Took 0.012 seconds to convert 2 words into a regex 18 bytes long.
Took 0.624 seconds to convert 4 words into a regex 40 bytes long.
Took 0.033 seconds to convert 8 words into a regex 95 bytes long.
Took 0.095 seconds to convert 16 words into a regex 156 bytes long.
Took 0.057 seconds to convert 32 words into a regex 358 bytes long.
Took 0.171 seconds to convert 64 words into a regex 743 bytes long.
Took 0.309 seconds to convert 128 words into a regex 1402 bytes long.
Took 0.40900000000000003 seconds to convert 256 words into a regex
2692 bytes long.
Took 1.863 seconds to convert 512 words into a regex 5341 bytes long.
Took 0.838 seconds to convert 1024 words into a regex 10328 bytes long.
Took 1.504 seconds to convert 2048 words into a regex 19733 bytes long.
Took 2.814 seconds to convert 4096 words into a regex 37334 bytes long.
Took 8.177 seconds to convert 8192 words into a regex 71593 bytes long.
Took 15.181000000000001 seconds to convert 16384 words into a regex
133779 bytes long.
Took 30.695 seconds to convert 32768 words into a regex 244280 bytes long.
Took 61.555 seconds to convert 65536 words into a regex 432751 bytes long.
Took 155.94400000000002 seconds to convert 131072 words into a regex
713573 bytes long.
Took 224.93 seconds to convert 262144 words into a regex 1060079 bytes long.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space