normalperson / yhbt.net wrote:
> I doubt I can noticeably improve performance with futexes vs mutex/condvar.

Totally not-speed-optimized futex-based lock/condvar implementation at

	git://bogomips.org/ruby.git (futex branch)
	http://bogomips.org/ruby.git/patch?id=ae93c50c8de

I am not sure if my implementation is correct, but "make check" passes
with both 8 cores and 1 core active (8-core Vishera).  I will probably
write an independent (C-only) test for more parallelism and maybe steal
some from glibc (I also plan on using this futex-based lock
implementation outside of Ruby).

Benchmarks don't seem to show much (if any) improvement, yet.  Speed
improvement from reimplementing GVL around bare futex interface may be
possible (w/o using separate condvar/mutex layer).

On amd64 GNU/Linux, pthread_mutex_t is 40 bytes, but these futex-based
locks only need 4 bytes.  Similarly, pthread_cond_t is 48 bytes, making
rb_nativethread_cond_t 56 bytes with pthreads; this futex implementation
currently requires only 16 bytes for a condvar.

Size improvement may be noticeable for some apps with many Mutexes:
the lock/cond reductions mean rb_mutex_struct is now 48 bytes instead
of 128 bytes.