Bug #2025: problem with pthread handling on non NPTL platform
http://redmine.ruby-lang.org/issues/show/2025

Author: Petr Salinger
Status: Open, Priority: Normal
Target version: 1.9.x
ruby -v: 1.9.1.243

I tried to fix some testsuite failures on GNU/kFreeBSD,
http://bugs.debian.org//cgi-bin/bugreport.cgi?bug=542927.
I observed some problems in the pthread related code.
The hang in 1st test in
http://redmine.ruby-lang.org/issues/show/1525
also applies for us.

IMO, the ruby should try to work under any POSIX pthread
conforming implementation, not only NPTL.
The code audit in this area seems needed.


There are some problems with handling of fork()/exec().
There really should be reinitialization of locks in child,
the timer should be started using pthread_once(), the current
approach is fragile and might lead to start of more timer threads.
http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_once.html

In general, I do not understand how code in thread_pthread.c:

static pthread_t timer_thread_id;
static pthread_cond_t timer_thread_cond = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t timer_thread_lock = PTHREAD_MUTEX_INITIALIZER;
rb_thread_create_timer_thread()
thread_timer()

could survive correctly fork(), see also
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html



I really doubt the following code in process.c
for rb_f_fork(VALUE obj) is correct:

     switch (pid = rb_fork(0, 0, 0, Qnil)) {
       case 0:
 #ifdef linux
         after_exec();
 #endif
         rb_thread_atfork();
         if (rb_block_given_p()) {
             int status;

             rb_protect(rb_yield, Qundef, &status);
             ruby_stop(status);
         }

The conditional after_exec() shouldn't be here.
There is already "after_fork()" at line 2331,
which is executed for both parent and child.
The exception is when chfunc is not NULL,
then it is not executed at all.

The bug is timing dependent, i.e. there is a race condition.
Sometimes the child process would have 2 timer threads, sometimes
it would have the expected 1.

Only the probability of 2 is higher on linuxthreads compared to NPTL,
but it can happen under any pthread implementation.



Ruby should not use PTHREAD_CREATE_DETACHED and after that use pthread_join.
http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_join.html:
"The behavior is undefined if the value specified by the thread argument
to pthread_join() does not refer to a joinable thread."



Ruby should use pthread_sigmask() instead of sigprocmask()
when available and so on.
http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_sigmask.html:
"The use of the sigprocmask() function is unspecified in a



This would work correctly on both linuxthreads/NPTL and should on any
POSIX pthread conforming implementation.
Ideally, ruby would not require full conformance, but also
accept some known exceptions, like our getpid() difference.


----------------------------------------
http://redmine.ruby-lang.org