On Thu, May 31, 2007 at 05:31:53PM +0900, Erwin Abbott wrote:
> >Well, that's the general locking problem in a nutshell, and many books have
> >been written about this :-)
> 
> I'm very interested in a book on programming with threads, any
> recommendations?  I'll probably browse the bookstore this weekend. I'm
> afraid everything will be focused on specifics like language/OS rather
> than general concepts. It's also hard to know what's good from reading
> reviews on Amazon.

I have a good book on principles of operating systems which talks a lot
about synchronisation; unfortunately it's in a different location right now
and I can't remember the exact title. It's pretty old.

> The reason for using synchronization in #search is because we might
> start in the middle of a #write call from another thread, right? I had
> something in mind like an SQL database which would search using the
> "not-yet-written" version of the data... but now I think I see that's
> quite complicated.

No, that's quite doable as well. What you need to do is to protect the part
where you replace @data with new_data, to ensure there are no readers of
@data while you're changing it.

To do this you need a shared lock for the readers, and for the writer to
gain an exclusive lock at the point where it wants to swap the data. While
any shared locks are held, the exclusive lock will not be granted; and while
an exclusive lock is held no shared or exclusive locks will be granted.

Have a look at sync.rb in the standard Ruby distribution, although I've not
used it in anger myself. Something like this might do the trick:

require 'sync'

class DatabaseServer

  def initialize
    @sync = Sync.new
    @data = {}
  end

  def search(k)
    @sync.synchronize(Sync::SH) {
      puts "Reading..."
      sleep 1
      puts "Read done"
      @data[k]
    }
  end

  def update(k,v)
    @sync.synchronize(Sync::EX) {
      puts "Updating..."
      @data[k] = v
      sleep 2
      puts "Update done"
    }
  end
end

d = DatabaseServer.new

t1 = Thread.new {
  sleep 0.5
  d.update("foo","bar")
}
t2 = Thread.new {
  puts "t2(a): #{d.search("foo")}"
  puts "t2(b): #{d.search("foo")}"
}
t1.join
t2.join

Note that this is an artificial example, since reading an element from a
hash or writing an element to a hash are 'atomic' operations as far as the
Ruby thread scheduler is concerned. That is, in this simplistic case,
everything will work fine without explicit synchronisation. But as soon as
your searches or updates become multi-step operations (e.g. a search
involves a lookup in index I followed by a read of record R) then you need
this, so that for any particular operation, the index and the records are in
a consistent state throughout. The same applies for write-after-read
operations, such as @id=@id+1.

You'll also need to be careful about what you return to your clients. Don't
return a reference to some object which may be mutated later; use 'dup' to
make a copy for the client if necessary.

> I also think I just came to a realization. In the threading examples
> I've seen, there's typically a "sleep 5" as a placeholder for some big
> computation. But this simulates an I/O bound process instead of
> something computationally intesive. So processing 10 items each with
> their own thread (which just sleeps for 5 seconds) will finish in
> close to 5 seconds... a huge improvement over processing them
> serially. But processing the items was actually CPU intensive, would
> it be more like 50 seconds?

Yes.

However, using sleep also helps you simulate race conditions, where thread 1
does X followed by Y, and you want to see what happens if thread 2 does Z in
between X and Y. That's what I'm using it for above. You can use this to
demonstrate that two or more threads can successfully obtain a shared lock
at the same time, for instance.

BTW there's a good introductory chapter in Programming Ruby:
http://www.rubycentral.com/book/tut_threads.html

And you might want to look at 'madeleine', which is an in-RAM object
database written in Ruby.

Regards,

Brian.