Bill Paulson ڧѧ 05.06.2012 22:52:
> I work on a fairly large Ruby application that runs as a daemon,
> accepting requests for test jobs and distributing them. It runs on 
> Ruby
> 1.9.1-p429 currently on Linux Redhat 5. The job scheduler ("foreman")
> occasionally locks up in one site or another with a stack trace for 
> the
> main thread that looks like the following (many levels of stack 
> elided):

1.9.1 is something of a danger sign for me: it never worked good enough 
for
production, at least for me.

>
> Thread 4 (process 5488):
> #0  0x00d4d410 in __kernel_vsyscall ()
> #1  0x00d1a0b9 in __lll_lock_wait () from /lib/libpthread.so.0
> #2  0x00d15834 in _L_lock_92 () from /lib/libpthread.so.0
> #3  0x00d1534a in pthread_mutex_lock () from /lib/libpthread.so.0
> #4  0x0811d82e in native_mutex_lock (lock=0xfffffffc) at
> thread_pthread.c:36
> #5  0x081226a6 in rb_thread_call_with_gvl (func=0x8062b40 
> <gc_with_gvl>,
> data1=0x81954e0) at thread.c:984
> #6  0x0806282e in garbage_collect_with_gvl (objspace=0x81954e0) at
> gc.c:598
> #7  0x080628a6 in vm_xmalloc (objspace=0x81954e0, size=160796684) at
> gc.c:624
> #8  0x080de09e in rb_str_buf_new (capa=128) at string.c:722
> #9  0x080d29bb in rb_enc_vsprintf (enc=0x0, fmt=0x81552ac "%s:%d:in
> `%s'", ap=0xbf8f95e4 "X!?\t\"") at sprintf.c:1125
> #10 0x080d2a50 in rb_vsprintf (fmt=0x81552ac "%s:%d:in `%s'",
> ap=0xbf8f95e4 "X!?\t\"") at sprintf.c:1154
> #11 0x080d2a7b in rb_sprintf (format=0x81552ac "%s:%d:in `%s'") at
> sprintf.c:1164
> #12 0x0810ef01 in vm_backtrace_each (th=<value optimized out>,
> limit_cfp=0xb7fbea98, cfp=0xb7fbed98, file=0x9a42158
> "/local/pete/current/com.rb", line_no=34, ary=174937756) at vm.c:714
> #13 0x0810fd32 in rb_f_caller (argc=0, argv=0xb7f3f1f8) at vm.c:746
> #14 0x0810dea6 in call_cfunc (func=0x810fca0 <rb_f_caller>,
> recv=165378108, len=128, argc=-4, argv=0xb7f3f1f8) at
> vm_insnhelper.c:286
>
> The foreman's apparently trying to lock the Global Vm Lock to do a
> malloc().
> Several questions:
> 1) How to tell which thread owns the global_vm_lock? In gdb, printing
> th->blocking_region_buffer for the main thread returns a pointer to
> something, while another thread returns a zero value for
> blocking_region_buffer. This suggests that the other thread owns the
> global_vm_lock. On the other hand, if I dump the actual 
> global_vm_lock,
> the __data.__owner is 5488, the thread id for the main thread. I'm
> thinking that this thread is trying to lock the GVL when it already 
> has
> it, but I'm not sure.
> 2) in frame 8, the rb_str_buf_new should be requesting 129 bytes, yet
> the size parameter in frame 7 is more than 160 Megabytes! This huge
> allocation appears to be what triggers garbage collection with gvl. 
> How
> this happens is a complete mystery.
> 3) within gdb, is it possible to figure out what the content of the
> backtrace would be if it didn't hang? I know that ruby's backtrace 
> stack
> is somewhat complex - any pointers to simple-ish ways to find out 
> where
> the ruby interpreter is?

Take a look at ruby's gdbinit:
   http://rxr.whitequark.org/mri/source/.gdbinit

For example, try rb_backtrace or rb_ps at gdb's prompt.

>
> Thanks,
> Bill Paulson

-- 
   WBR, Peter Zotov.