Charlie Savage <cfis / savagexi.com> wrote: > Sorry, these machines are actuall CentOS 5.6. The latest patches were > applied via yum update about a week ago, so its pretty up-to-date. OK, I'm closer with 2.6.18-238.9.1.el5xen but still can't reproduce it. I don't have permission to upgrade kernels on CentOS images, unfortunately. It's the weekend so the folks that do have permission aren't around... > So what we see is this test hanging: > > def test_datagrams > $in = $out = "" > EM.run { > EM.open_datagram_socket "127.0.0.1", @port, TestDatagramServer > EM.open_datagram_socket "127.0.0.1", 0, TestDatagramClient, @port > } > assert_equal( "1234567890", $in ) > assert_equal( "abcdefghij", $out ) > end > > It hangs on the first EM.open_datagram_socket call. Can you show us "strace -f -v" output from that test? Maybe sprinkle some `fprintf(stderr, "%s:%d\n", __FILE__, __LINE__);' or similar inside EventMachine_t::OpenDatagramSocket and see where it gets to? It shouldn't hit gethostbyname()... > Here is another one, this time from test_pure_ruby.rb (which in fact seems misnamed, it is using the C code): > > def test_connrefused > assert_nothing_raised do > EM.run { > setup_timeout(2) > EM.connect "127.0.0.1", @port, TestConnrefused > } > end > > In this one, its the EM connect call that hangs. I can't reproduce this, either... Also, can you extract these tests and run with a hand-picked port? > Let me know if there is anything we can do to help debug this. Its > happens across 8 servers (all of which are at the same CentOS release, > albeit they did start as the same VM image a while back). I assume you tried a clean build/install of Ruby to make sure all objects got rebuilt and reinstalled? Can you also try running `pmap $PID' on the hung processes to make sure it's loading the correct libs + versions?