Bill Paulson §á§Ú§ã§Ñ§Ý 05.06.2012 22:52: > I work on a fairly large Ruby application that runs as a daemon, > accepting requests for test jobs and distributing them. It runs on > Ruby > 1.9.1-p429 currently on Linux Redhat 5. The job scheduler ("foreman") > occasionally locks up in one site or another with a stack trace for > the > main thread that looks like the following (many levels of stack > elided): 1.9.1 is something of a danger sign for me: it never worked good enough for production, at least for me. > > Thread 4 (process 5488): > #0 0x00d4d410 in __kernel_vsyscall () > #1 0x00d1a0b9 in __lll_lock_wait () from /lib/libpthread.so.0 > #2 0x00d15834 in _L_lock_92 () from /lib/libpthread.so.0 > #3 0x00d1534a in pthread_mutex_lock () from /lib/libpthread.so.0 > #4 0x0811d82e in native_mutex_lock (lock=0xfffffffc) at > thread_pthread.c:36 > #5 0x081226a6 in rb_thread_call_with_gvl (func=0x8062b40 > <gc_with_gvl>, > data1=0x81954e0) at thread.c:984 > #6 0x0806282e in garbage_collect_with_gvl (objspace=0x81954e0) at > gc.c:598 > #7 0x080628a6 in vm_xmalloc (objspace=0x81954e0, size=160796684) at > gc.c:624 > #8 0x080de09e in rb_str_buf_new (capa=128) at string.c:722 > #9 0x080d29bb in rb_enc_vsprintf (enc=0x0, fmt=0x81552ac "%s:%d:in > `%s'", ap=0xbf8f95e4 "X!?\t\"") at sprintf.c:1125 > #10 0x080d2a50 in rb_vsprintf (fmt=0x81552ac "%s:%d:in `%s'", > ap=0xbf8f95e4 "X!?\t\"") at sprintf.c:1154 > #11 0x080d2a7b in rb_sprintf (format=0x81552ac "%s:%d:in `%s'") at > sprintf.c:1164 > #12 0x0810ef01 in vm_backtrace_each (th=<value optimized out>, > limit_cfp=0xb7fbea98, cfp=0xb7fbed98, file=0x9a42158 > "/local/pete/current/com.rb", line_no=34, ary=174937756) at vm.c:714 > #13 0x0810fd32 in rb_f_caller (argc=0, argv=0xb7f3f1f8) at vm.c:746 > #14 0x0810dea6 in call_cfunc (func=0x810fca0 <rb_f_caller>, > recv=165378108, len=128, argc=-4, argv=0xb7f3f1f8) at > vm_insnhelper.c:286 > > The foreman's apparently trying to lock the Global Vm Lock to do a > malloc(). > Several questions: > 1) How to tell which thread owns the global_vm_lock? In gdb, printing > th->blocking_region_buffer for the main thread returns a pointer to > something, while another thread returns a zero value for > blocking_region_buffer. This suggests that the other thread owns the > global_vm_lock. On the other hand, if I dump the actual > global_vm_lock, > the __data.__owner is 5488, the thread id for the main thread. I'm > thinking that this thread is trying to lock the GVL when it already > has > it, but I'm not sure. > 2) in frame 8, the rb_str_buf_new should be requesting 129 bytes, yet > the size parameter in frame 7 is more than 160 Megabytes! This huge > allocation appears to be what triggers garbage collection with gvl. > How > this happens is a complete mystery. > 3) within gdb, is it possible to figure out what the content of the > backtrace would be if it didn't hang? I know that ruby's backtrace > stack > is somewhat complex - any pointers to simple-ish ways to find out > where > the ruby interpreter is? Take a look at ruby's gdbinit: http://rxr.whitequark.org/mri/source/.gdbinit For example, try rb_backtrace or rb_ps at gdb's prompt. > > Thanks, > Bill Paulson -- WBR, Peter Zotov.