On 11/17/06, brabuhr / gmail.com <brabuhr / gmail.com> wrote: > On 11/12/06, Ross Bamford <rosco / roscopeco.remove.co.uk> wrote: > > On Sun, 12 Nov 2006 15:01:56 -0000, Peter Schrammel > > <peter.schrammel / gmx.de> wrote: > > > Why is there a limitation at all? I implemented the same thing in perl > > > and it no complains ... > > > Is the regexp engine of perl that much better? > > > > > > > Irrespective of whether regex the best solution for your needs, it seems > > Oniguruma will improve the situation somewhat with respect to large > > regular expressions. > > I built a local version of 1.8.5 with the oniguruma engine: > http://raa.ruby-lang.org/project/oniguruma/ > > And re-ran (a slight variation of) my test program: I thought I'd try running under jruby too: $ ruby long_regex_test.rb Took 0.000153 seconds to convert 1 words into a regex 17 bytes long. Took 0.000381 seconds to convert 2 words into a regex 20 bytes long. Took 0.000393 seconds to convert 4 words into a regex 36 bytes long. Took 0.000629 seconds to convert 8 words into a regex 93 bytes long. Took 0.001359 seconds to convert 16 words into a regex 180 bytes long. Took 0.002261 seconds to convert 32 words into a regex 360 bytes long. Took 0.007304 seconds to convert 64 words into a regex 741 bytes long. Took 0.013601 seconds to convert 128 words into a regex 1348 bytes long. Took 0.028273 seconds to convert 256 words into a regex 2746 bytes long. Took 0.066228 seconds to convert 512 words into a regex 5345 bytes long. Took 0.177105 seconds to convert 1024 words into a regex 10017 bytes long. Took 0.330573 seconds to convert 2048 words into a regex 19597 bytes long. Took 1.390542 seconds to convert 4096 words into a regex 37345 bytes long. long_regex_test.rb:26:in `match': regular expression too big: /(?:A(?:cr(?:edula|opora)|d(?:ar|elochorda|ventis[mt])|frogaean|hepatokla|ileen|l(?:adinist|l(?:a(?:manda|sch)|otheria)|ticamelus)|m(?:bystomidae|ericanly|ioidei|phioxidae)|n(?:chisaurus|d(?:aman|romache)|olympiad|t(?:echinomys|h(?:ophila|ropozoic)))|patornis|r(?:ab|chelenis|istarch)|s(?:caridia|elli|hantee|ilidae|terias)|tropa|u(?:riculidae|stroasiatic))|B(?:a(?:cchus|eria|haism|iera|k(?:shaish|wiri)|re|sili(?:ca|scus))|e(?:atrice|l(?:g(?:ae|ic)|shazzaresque)|mbex|rn(?:inesque|oullian))|i(?:elid|lati|smarck|tis)|lackfoot|o(?:hemia|llandist|rrovian)|ra(?:m|nchiopulmonata)|u(?:nga|phthalmum)|yroni(?:cs|te))|C(?:a(?:ctales|l(?:edonia|li(?:carpa|stephus)|ochortaceae|vados|ycophorae)|m(?:bodian|orra)|ntabri|p(?:ito(?:line)?|sidae)|r(?:olan|tist)|s(?:sandra|tanospermum)|thari)|e(?:ntrarchidae|strian)|h(?:arontas|e(?:lura|makuan)|rist(?:ianomastix|li(?:keness|ness)|mas))|lathrus|o(?:bleskill|fane|l(?:letidae|ossian)|m(?:melinaceae|us)|rybantic)|rocus|u(?:cumariidae|thbert)|y(?:clos pondy (RegexpError) from long_regex_test.rb:26 from long_regex_test.rb:15:in `times' from long_regex_test.rb:15 $ /opt/ruby/v1.8.5-oniguruma/bin/ruby long_regex_test.rb Took 0.000211 seconds to convert 1 words into a regex 5 bytes long. Took 0.000334 seconds to convert 2 words into a regex 24 bytes long. Took 0.000215 seconds to convert 4 words into a regex 52 bytes long. Took 0.000836 seconds to convert 8 words into a regex 92 bytes long. Took 0.000885 seconds to convert 16 words into a regex 173 bytes long. Took 0.002779 seconds to convert 32 words into a regex 345 bytes long. Took 0.004934 seconds to convert 64 words into a regex 725 bytes long. Took 0.009765 seconds to convert 128 words into a regex 1369 bytes long. Took 0.020761 seconds to convert 256 words into a regex 2737 bytes long. Took 0.088759 seconds to convert 512 words into a regex 5408 bytes long. Took 0.144276 seconds to convert 1024 words into a regex 10131 bytes long. Took 0.246762 seconds to convert 2048 words into a regex 19531 bytes long. Took 0.667575 seconds to convert 4096 words into a regex 37498 bytes long. Took 1.677037 seconds to convert 8192 words into a regex 71352 bytes long. Took 2.971277 seconds to convert 16384 words into a regex 133499 bytes long. Took 6.078681 seconds to convert 32768 words into a regex 245318 bytes long. Took 13.001538 seconds to convert 65536 words into a regex 433611 bytes long. Took 26.791838 seconds to convert 131072 words into a regex 713229 bytes long. Took 47.691109 seconds to convert 262144 words into a regex 1061186 bytes long. Took 71.050324 seconds to convert 524288 words into a regex 1354567 bytes long. $ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home $ ~/Desktop/jruby-0.9.1/bin/jruby long_regex_test.rb Took 0.032 seconds to convert 1 words into a regex 9 bytes long. Took 0.012 seconds to convert 2 words into a regex 18 bytes long. Took 0.624 seconds to convert 4 words into a regex 40 bytes long. Took 0.033 seconds to convert 8 words into a regex 95 bytes long. Took 0.095 seconds to convert 16 words into a regex 156 bytes long. Took 0.057 seconds to convert 32 words into a regex 358 bytes long. Took 0.171 seconds to convert 64 words into a regex 743 bytes long. Took 0.309 seconds to convert 128 words into a regex 1402 bytes long. Took 0.40900000000000003 seconds to convert 256 words into a regex 2692 bytes long. Took 1.863 seconds to convert 512 words into a regex 5341 bytes long. Took 0.838 seconds to convert 1024 words into a regex 10328 bytes long. Took 1.504 seconds to convert 2048 words into a regex 19733 bytes long. Took 2.814 seconds to convert 4096 words into a regex 37334 bytes long. Took 8.177 seconds to convert 8192 words into a regex 71593 bytes long. Took 15.181000000000001 seconds to convert 16384 words into a regex 133779 bytes long. Took 30.695 seconds to convert 32768 words into a regex 244280 bytes long. Took 61.555 seconds to convert 65536 words into a regex 432751 bytes long. Took 155.94400000000002 seconds to convert 131072 words into a regex 713573 bytes long. Took 224.93 seconds to convert 262144 words into a regex 1060079 bytes long. Exception in thread "main" java.lang.OutOfMemoryError: Java heap space