On Wed, Oct 19, 2011 at 2:11 AM, Josh Cheek <josh.cheek / gmail.com> wrote:
> A great followup to this post, explains why the GIL exists
> http://merbist.com/2011/10/18/data-safety-and-gil-removal/
>
> When I ran the code Matt provides under MRI 1.9.3 (has GIL) and Rubinius,
> JRuby, MacRuby (native threads, no GIL):

Ok, I can't let this one sit.

To my eyes, the only one broken there is MRI. It's not actually doing
anything in parallel, so you get the synchronous result. Perhaps I
should file a bug against MRI that its threads...aren't?

In all seriousness, though, this is flawed reasoning. Spinning up
threads is asking the runtime to do something in parallel, and MRI is
the only example here not delivering. You are asking for the result
you get under JRuby, Rubinius, and MacRuby, since you don't
synchronize any access to the shared array, and the shared array does
not (according to Matz himself) have thread safety as part of its
contract. The only reason you get the other result under MRI is
because it isn't actually doing what you've asked of it.

Saying that the GIL is useful based on this example is a bit like
saying "JRuby not supporting C extensions is useful because they'll
never crash due to C extensions." You can't compare lack of
parallelism with parallelism when you're trying to demonstrate
parallelism.

> * Other times I ran it under JRuby, it detected the corrupt data with
> 'ConcurrencyError: Detected invalid array contents due to unsynchronized
> modifications with concurrent users'

We do our best to detect this for Array, and at some point we'll try
to do it for Hash (Hash will currently raise errors from Java like
ArrayIndexOutOfBoundsException...still rescuable, but not as nice). It
would be cool if Ruby incorporated some explicitly thread-safe
collections by default, but there are gems that provide such things
right now.

FWIW, it's almost impossible to have threadsafe data structures that
perform as well as non-threadsafe data structures, which is why we've
always opted to keep Array and Hash the way they are. Hopefully people
are starting to learn that the alternatives aren't that bad, like
using external threadsafe libs or simply mutexing around all accesses.

> * I ran this a whole bunch of times, sometimes MRI was fastest, sometimes
> MacRuby, sometimes JRuby (MRI was fastest most consistently, though)

For a run that short, I'm not surprised. JRuby would be faster if it
ran for more than...what...0.07 seconds? I ran a longer version
without threads (so it wouldn't error out) and JRuby was clearly the
fastest. I also wrote a version that uses a JRuby-specific module for
thread-safety, and it only slowed down by about 2x...but it completes
successfully every time:

require 'jruby/synchronized'
puts '', ENV['RUBY_VERSION']

class SafeArray < Array
  include JRuby::Synchronized
end

10.times do
  @array, threads = SafeArray.new, []
  start = Time.now
  4.times do
   threads << Thread.new { (1..100_000).each {|n| @array << n} }
  end
  threads.each{|t| t.join }
  stop = Time.now

  puts "%0.3f seconds" % (stop - start), @array.size
end

> Thoughts:
> * MRI has a GIL, thus keeping the data safe, and still performs equivalently
> with other implementations (for this admittedly limited test), so do
> benchmarks to decide if this will be worthwhile. It's not a fluke that Matz
> wants to keep the GIL.

False safety (you can still easily have threads step on each other) at
the expense of parallelism. I'm not sure that's a win.

Also, I don't think Matz has ever said he really "wants" to keep the
GIL. It's just a massively difficult thing to retrofit MRI for
parallel threading without a very large rework. If they could drop the
GIL without destabilizing MRI itself, I'm sure they'd do it.

> * I'm glad JRuby notices the corrupt data (though not always) I'm a big fan
> of fail-fast

It only fails fast if it actually fails, of course. Some of your runs
manage to succeed without the threads stepping on each other. And by
failure, here, I mean potentially corrupting the array. The array
contents may get out of sync because you don't synchronize writes, but
that's not a failure in a concurrent environment. Or at least, it's
not JRuby's failure...it's yours.

> * Has JRuby fixed their startup time issue? I ran this a lot of times and
> didn't notice any of the lag I used to.

That's good to hear! Every release includes more startup-time tweaks.
Perhaps we're finally "getting there".

- Charlie