On 11/30/06, Bob Hutchison <hutch / recursive.ca> wrote:
> A little more on this...
>
> On 30-Nov-06, at 10:36 AM, Bob Hutchison wrote:
>
> > Hi,
> >
> > I'm getting a 'Segmentation fault' in ruby 1.8.5 running on debian
> > in a Xen VPS. The same code running on OS X and a different version
> > of linux has no problems.
> >
> > The process to get this is maybe a little strange.
> >
> > 1) read a large file into a string (1.3MB)
> > 2) eval the string (the string is a single ruby proc definition
> > that when called will build an object structure in memory)
> > 3) call the proc --> Segmentation fault *very* soon after
> >
> > The file was generated by the same program but it was running but
> > on a different machine, in this case the other linux box I
> > mentioned above.
> >
> > Knowning full well that there can be all kinds of differences
> > between the linuxes, I'll claim that the only interesting
> > difference that I can find is/was in the architectures reported by
> > ruby --version: on the machine that works reports i686-linux, the
> > machine that doesn't reports i386-linux -- so I rebuilt a version
> > that was also i686 and, of course, this made no difference. So all
> > that means is that I can't find the truly interesting difference.
> >
> > If I edit the file from where the string is read, and replace a
> > bunch of assignments of a particular type of object (the objects
> > are still created) (about 6000 of them) then the problem
> > disappears. There's nothing special about the objects I got rid of,
> > it was just easy to use regular expressions to identify them and
> > get rid of their assignment.
> >
> > If I try running ruby through gdb there is a SIGSEGV signal at
> > eval.c:2890 -- which is the unknown_node method but I can't get a
> > more complete stacktrace (until I figure out how to build ruby with
> > the debug information not stripped out). Manually poking around
> > though, method_call calls rb_call0 calls unknown_node so I'm
> > betting on this. And so? Well maybe the eval of the string produced
> > an invalid proc object? What's the cause of this? Too long a
> > string? too many objects in the eval? too big a proc object? But
> > why work on one linux box and fail on the other?
>
> So I put some printf into the eval.c file and it turns out that
> rb_eval is called recursively 5301 times before seg faulting, while
> trying to handle a NODE_DASGN_CURR node. There are no other eval node
> types being evaluated when this begins, every node is a NODE_DASGN_CURR.
>
> There is nothing that is anywhere that deep in the script that I am
> evaluating. So it looks as though the proc object is corrupt??
>
> So maybe this is reproducible?? Well, so it is. If I run this script:
>
> module SomeModule
>    def initialize
>      @@proc = nil
>    end
>
>    def SomeModule.build
>      if @@proc then
>        result = @@proc.call
>        @@proc = nil
>        return result
>      end
>    end
> end
>
> N = 5000
>
> the_string = ""
>
> the_string << "module SomeModule\n"
> the_string << "  @@proc = Proc.new {\n"
> the_string << "    thing = []\n"
>
> N.times do | i |
>    the_string <<  "    v#{i} = [#{i}]\n"
> end
>
> N.times do | i |
>    the_string <<  "    thing << v#{i}\n"
> end
>
> the_string << "    thing\n"
> the_string << "  } #proc\n"
> the_string << "end\n"
>
> puts("the_string length: #{the_string.length}")
> eval(the_string, nil, "ruby_definition", 0)
> SomeModule.build
>
>
> It will fail on the one linux box, run on the other, and run on OS X.
> With a little binary search, the smallest N that causes the segfault
> is 3024 (3023 works).
>
> Does this help?
>
>

Segfaults for me on my Debian box with ruby 1.8.4 (2005-12-24) [i386-linux]