Issue #5306 has been updated by Charlie Savage.


Hi Eric,

>> It causes eventmachine to hang on CentOS 5.5. 

Sorry, these machines are actuall CentOS 5.6.  The latest patches were applied via yum update about a week ago, so its pretty up-to-date.

> I have CentOS 5.4, x86_64, kernel 2.6.18-164.11.1.el5
> 
>    rake compile
>    ruby -I .:lib:tests/ tests/test_epoll.rb
> 
> Works for me on an unpacked eventmachine-1.0.0.beta.3 tree with
> ruby_1_9_3 branch.  However, only 2 tests appeared enabled.

So what we see is this test hanging:

def test_datagrams
$in = $out = ""
EM.run {
EM.open_datagram_socket "127.0.0.1", @port, TestDatagramServer
EM.open_datagram_socket "127.0.0.1", 0, TestDatagramClient, @port
}
assert_equal( "1234567890", $in )
assert_equal( "abcdefghij", $out )
end

It hangs on the first EM.open_datagram_socket call.

Here is another one, this time from test_pure_ruby.rb (which in fact seems misnamed, it is using the C code):

def test_connrefused
assert_nothing_raised do
EM.run {
setup_timeout(2)
EM.connect "127.0.0.1", @port, TestConnrefused
}
end

In this one, its the EM connect call that hangs.

> I'll try to find a machine closer to the above.

Probably a yum update will get you there...

Let me know if there is anything we can do to help debug this.  Its happens across 8 servers (all of which are at the same CentOS release, albeit they did start as the same VM image a while back).

Charlie
----------------------------------------
Bug #5306: Application Hangs Due to Recent rb_thread_select Changes
http://redmine.ruby-lang.org/issues/5306

Author: Charlie Savage
Status: Open
Priority: High
Assignee: 
Category: core
Target version: 1.9.3
ruby -v: ruby 1.9.3dev (2011-09-09 revision 33236) [x86_64-linux]


This commit:

4e9438bc9153f7a1f4ea0af85c8dbe359e1a55d8

Changed the implementation of rb_thread_select.  

It causes eventmachine to hang on CentOS 5.5.  Not sure what the issue is, but its easily reproduced by by running the test eventmachine/tests/test_epoll.rb.  

We noticed this because it also causes the tweetstream gem to hang.

The same setup works on Fedora 14 and an up-to-date arch linux.  Specific version information included below.

We temporarily fixed this by reverting the commit.

Since Centos is a common production environment (and the one we are using), this seems to us a blocker for 1.9.3. 

We are happy to provide any additional information or test fixes.  

Thanks - Charlie

--------------
We are running this version of CentOS:

Linux app1.zerista.com 2.6.18-238.19.1.el5.centos.plus #1 SMP Mon Jul 18 10:05:09 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

And this version of Fedora:

Linux ammonite.internal.zerista.com 2.6.35.14-95.fc14.x86_64 #1 SMP Tue Aug 16 21:01:58 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

And this version of eventmachine:

eventmachine (1.0.0.beta.3)

And this version of tweetstream:

tweetstream (1.0.4)


-- 
http://redmine.ruby-lang.org