Issue #5306 has been updated by Charlie Savage.

File strace_hangs.log added
File strace_completes.log added
File strace_pure.log added
File pmap.log added

Ok, on the first test, strange results.  Running this command:

strace -f -v ruby -I.:lib:tests tests/test_epoll.rb -n test_datagrams

Hangs the test as expected.  But running this command:

strace -f -v ruby -I.:lib:tests tests/test_epoll.rb -n test_datagrams &> /tmp/strace1.log

Causes the test runs to completion.  And then annoyingly enough that one particular test works after that.  If I reboot the machine, then the test hangs again.

I have attached 2 logs, strace_completes.log and strace_hangs.log.  stace_hangs.log is only the last few hundred lines (rest scrolled off the top), but what I saw matches strace_completes.log to line 2,271.  After that, the two diverge.

The story is different for the second test:

strace -v -v ruby -I.:lib:tests tests/test_pure.rb -n test_connrefused 2>&1 | tee /tmp/strace_pure.log

That log is attached.

As for your other questions:

> Also, can you extract these tests and run with a hand-picked port? 

Sure.  The connection refused one is intentionally picking the first unused port.  It turns out to be 9001.

> I assume you tried a clean build/install of Ruby to make sure all > objects got rebuilt and reinstalled?

Yes.

$cd /usr/src/ruby
$git pull (on the ruby 193 branch)
$git clean -fx
$autoconf
$./configure --prefix=/usr --enable-shared=true
$make
$make install

> Can you also try running `pmap $PID' on the hung processes to make > sure it's loading the correct libs + versions?

$ps -ef | grep ruby
cfis     16185 15381  4 01:51 pts/1    00:00:00 ruby -I.:lib:tests 

$pmap 16185
(see attached log)

Hope this info helps.
----------------------------------------
Bug #5306: Application Hangs Due to Recent rb_thread_select Changes
http://redmine.ruby-lang.org/issues/5306

Author: Charlie Savage
Status: Open
Priority: High
Assignee: 
Category: core
Target version: 1.9.3
ruby -v: ruby 1.9.3dev (2011-09-09 revision 33236) [x86_64-linux]


This commit:

4e9438bc9153f7a1f4ea0af85c8dbe359e1a55d8

Changed the implementation of rb_thread_select.  

It causes eventmachine to hang on CentOS 5.5.  Not sure what the issue is, but its easily reproduced by by running the test eventmachine/tests/test_epoll.rb.  

We noticed this because it also causes the tweetstream gem to hang.

The same setup works on Fedora 14 and an up-to-date arch linux.  Specific version information included below.

We temporarily fixed this by reverting the commit.

Since Centos is a common production environment (and the one we are using), this seems to us a blocker for 1.9.3. 

We are happy to provide any additional information or test fixes.  

Thanks - Charlie

--------------
We are running this version of CentOS:

Linux app1.zerista.com 2.6.18-238.19.1.el5.centos.plus #1 SMP Mon Jul 18 10:05:09 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

And this version of Fedora:

Linux ammonite.internal.zerista.com 2.6.35.14-95.fc14.x86_64 #1 SMP Tue Aug 16 21:01:58 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

And this version of eventmachine:

eventmachine (1.0.0.beta.3)

And this version of tweetstream:

tweetstream (1.0.4)


-- 
http://redmine.ruby-lang.org