Bug #2644: memory over-allocation with regexp
http://redmine.ruby-lang.org/issues/show/2644

Author: Greg Hazel
Status: Open, Priority: Normal
ruby -v: ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]

Using a simple regular expression ruby allocates far too much memory, and can stack overflow.

Code:
p 1
s = "2" + (" " * 84149170)
p 2
s.match(/(\d) (.*)/)
p 3

Output:
1
2
hmm.rb:4:in `match': Stack overflow in regexp matcher: /(\d) (.*)/ (RegexpError)
        from hmm.rb:4

Stack overflow is not the worst of it. It's actually trying to allocate very large amounts of memory. Here is the output of REE, which prints when malloc tries to grab a lot:

1
2
tcmalloc: large alloc 1090519040 bytes == 0x49867000 @
tcmalloc: large alloc 2181038080 bytes == 0x8aa67000 @
tcmalloc: large alloc 18446744072140881920 bytes == (nil) @
tcmalloc: large alloc 4362076160 bytes == (nil) @
hmm.rb:4:in `match': Stack overflow in regexp matcher: /(\d) (.*)/ (RegexpError)
        from hmm.rb:4

External observation of processes show that this is memory over-allocation occurs across normal builds of 1.8.6, 1.8.7 and even 1.9.1 

(Before you say "this is just a problem with regexp in general!", I tested the same thing on python and perl. Both work satisfactorily with even larger strings.)


----------------------------------------
http://redmine.ruby-lang.org