> Bill Paulson ڧѧ 05.06.2012 22:52:
>> I work on a fairly large Ruby application that runs as a daemon,
>> accepting requests for test jobs and distributing them. It runs on Ruby
>> 1.9.1-p429 currently on Linux Redhat 5. The job scheduler ("foreman")
>> occasionally locks up in one site or another with a stack trace for the
>> main thread that looks like the following (many levels of stack elided):
> 
> 1.9.1 is something of a danger sign for me: it never worked good enough for
> production, at least for me.
I'll look into changing to a later release.

> 
>> 
>> Thread 4 (process 5488):
>> #0  0x00d4d410 in __kernel_vsyscall ()
>> #1  0x00d1a0b9 in __lll_lock_wait () from /lib/libpthread.so.0
>> #2  0x00d15834 in _L_lock_92 () from /lib/libpthread.so.0
>> #3  0x00d1534a in pthread_mutex_lock () from /lib/libpthread.so.0
>> #4  0x0811d82e in native_mutex_lock (lock=0xfffffffc) at
>> thread_pthread.c:36
>> #5  0x081226a6 in rb_thread_call_with_gvl (func=0x8062b40 <gc_with_gvl>,
>> data1=0x81954e0) at thread.c:984
>> #6  0x0806282e in garbage_collect_with_gvl (objspace=0x81954e0) at
>> gc.c:598
>> #7  0x080628a6 in vm_xmalloc (objspace=0x81954e0, size=160796684) at
>> gc.c:624
>> #8  0x080de09e in rb_str_buf_new (capa=128) at string.c:722
>> #9  0x080d29bb in rb_enc_vsprintf (enc=0x0, fmt=0x81552ac "%s:%d:in
>> `%s'", ap=0xbf8f95e4 "X!?\t\"") at sprintf.c:1125
>> #10 0x080d2a50 in rb_vsprintf (fmt=0x81552ac "%s:%d:in `%s'",
>> ap=0xbf8f95e4 "X!?\t\"") at sprintf.c:1154
>> #11 0x080d2a7b in rb_sprintf (format=0x81552ac "%s:%d:in `%s'") at
>> sprintf.c:1164
>> #12 0x0810ef01 in vm_backtrace_each (th=<value optimized out>,
>> limit_cfp=0xb7fbea98, cfp=0xb7fbed98, file=0x9a42158
>> "/local/pete/current/com.rb", line_no=34, ary=174937756) at vm.c:714
>> #13 0x0810fd32 in rb_f_caller (argc=0, argv=0xb7f3f1f8) at vm.c:746
>> #14 0x0810dea6 in call_cfunc (func=0x810fca0 <rb_f_caller>,
>> recv=165378108, len=128, argc=-4, argv=0xb7f3f1f8) at
>> vm_insnhelper.c:286
>> 
>> The foreman's apparently trying to lock the Global Vm Lock to do a
>> malloc().
>> Several questions:
>> 1) How to tell which thread owns the global_vm_lock? In gdb, printing
>> th->blocking_region_buffer for the main thread returns a pointer to
>> something, while another thread returns a zero value for
>> blocking_region_buffer. This suggests that the other thread owns the
>> global_vm_lock. On the other hand, if I dump the actual global_vm_lock,
>> the __data.__owner is 5488, the thread id for the main thread. I'm
>> thinking that this thread is trying to lock the GVL when it already has
>> it, but I'm not sure.
>> 2) in frame 8, the rb_str_buf_new should be requesting 129 bytes, yet
>> the size parameter in frame 7 is more than 160 Megabytes! This huge
>> allocation appears to be what triggers garbage collection with gvl. How
>> this happens is a complete mystery.
>> 3) within gdb, is it possible to figure out what the content of the
>> backtrace would be if it didn't hang? I know that ruby's backtrace stack
>> is somewhat complex - any pointers to simple-ish ways to find out where
>> the ruby interpreter is?
> 
> Take a look at ruby's gdbinit:
>  http://rxr.whitequark.org/mri/source/.gdbinit
> 
> For example, try rb_backtrace or rb_ps at gdb's prompt.
> 
The ruby we're running doesn't seem to have debug.c compiled into it. The .gdbinit file fails with a message 
No symbol "ruby_dummy_gdb_enums" in current context.
Do you know what compile flags cause debug.c to be included?
Thanks 
Bill Paulson