Bug #3840: Ruby 1.9.2p0 crashes using Rails 3.0 on repeated browser refresh
http://redmine.ruby-lang.org/issues/show/3840

Author: Sean O'Neil
Status: Open, Priority: Normal
Category: core
ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [i386-mswin32_90]

I'm not certain whether this is a Ruby issue or a Rails issue (or a bit of both). The problem is intermittent, and it appears to be a threading issue surrounding the reading of the Rails log file. It is fairly easy for me to reproduce the problem:

1) Install a clean 1.9.2p0 and run "gem install rails" to get Rails 3.0.
2) Use the Rails 3.0 generator script to create an empty project.
3) Launch the web app using default options.
4) Point a browser to the home page for the new Rails app, and keep pressing F5 (or Refresh) at varying speeds until Ruby crashes. It seems to happen most often if I hold F5 for a few seconds and then release for a few seconds, varying how long I hold and release it.

It crashes eventually no matter how I run it, but it seems to crash more often in production mode (which logs a bunch of errors using the default generated app). The crash appears to be dependent on timing, but I can usually make it crash in less than 60 seconds.

I'm currently using Windows 7 (x64) and compiling Ruby with Visual Studio 2008. However, the crash also occurs with the pre-packaged RubyInstaller for Windows. Given the debug information I've provided below, it does not appear to be platform-specific, but a different OS can have different IO and thread timing issues, so it may be harder to reproduce on other platforms.

I rebuilt Ruby with debugging options and got this call stack when it crashed:
rb_econv_convert() in transcode.c line 1449 (the ec parameter was set to NULL)
fill_cbuf() in io.c line 1691
read_all() in io.c line 1755
io_read() in io.c line 2163

I added some additional checks to the code to find out how fptr->readconv was getting cleared inside fill_cbuf(), and I traced it to io_fillbuf() and then rb_read_internal(). Now, rb_read_internal() does not have a pointer to fptr or any of its members. It could be a buffer overrun, but the buffer is much larger than the file, and the changes seem much too clean for that. In addition to readconv being cleared, other struct members are being reset to an earlier state (not necessarily back to 0), which seems to imply that another thread is accessing the same IO object while the current thread is inside rb_read_internal(). I can't "catch it in the act" because setting breakpoints affects the timing, and it typically requires hundreds of passes through this code to break. However, I can copy the struct fptr is pointing to and have it break as soon as those members change. The info being read right before the IO object is reset is:

rbuf    0x05700408 "
Started GET "/javascripts/dragdrop.js?1283353508" for 127.0.0.1 at 2010-09-08 16:22:14 -0400
ActionController::RoutingError (No route matches "/javascripts/dragdrop.js"):

Rendered d:/Projects/NetAuditor/v3.00/Code/Build/lib/ruby/gems/1.9.1/gems/actionpack-3.0.0/lib/action_dispatch/middleware/templates/rescues/routing_error.erb within rescues/layout (2.0ms)

"

I can eliminate this crash by changing the Ruby source to check for NULL before referencing the pointer. However, it may only hide the issue, which may come back to bite me later. So, does this sound like a problem that needs to be fixed in the Ruby code, or a thread-safety issue in Rails?

Thank you,
Sean O'Neil


----------------------------------------
http://redmine.ruby-lang.org