For my research, I've written bindings for the link-grammar[1] library
in ruby, and I am using them to parse all of the sentences in a corpus
of text and insert the parses into a database. We begin with code
which works roughly as follows:

    $dbh=DBI.connect(...)
    
    #for simplicity, it doesn't matter what the Something is.
    sentences=Something.new($dbh)
    
    d=Dictionary.new #Dictionary is an object wrapping the C library
    
    def putindatabase link
       #for our purposes, it doesn't really matter what this does
       #except to say it uses $dbh which we opened before
    end
    
    def parse sentencetext
       sentence=d.parse(sentencetext) #sentence also wraps the C library
       sentence.linkage[0].links.each do |link|
          putindatabase link
       end
    end
    
    #sentences.each fetches every sentence from the database and yields each 
    #one to the block. it keeps an open connection between yields
    sentences.each do |sentence|
       parse sentence
    end

Now, the link-grammar library that I'm using has a bug. It
segfaults[2] while freeing resources under some conditions that I
haven't quite figured out enough to fix in the C library itself.
Now, arguably it would be nice to catch the segfault as though it's an
exception, so that we could move on to the next sentence and get on
with our lives. But ruby/dl doesn't let us do that, and even if
ruby/dl did let us do that, it could leave the link-grammar library in
an inconsistent state. So we'll do the next best thing. We'll fork a
subprocess to handle each sentence. This will solve a few problems: 

* If a sentence fails, we'll be able to move on to the next one.
* A sentence won't fail before being put in the database, since the
  problem occurs when freeing the resources used to parse the sentence.
* We probably won't segfault at all because this only occurs under
  complicated circumstances which seem to involve the fact that you've
  parsed more than one sentence with the same dictionary.
* We don't have to clean up properly at all, since the termination of the
  child process after each sentence automatically takes care of that
  for us. (The link-grammar library doesn't allocate any resources
  that the OS doesn't know how to dispose of.) This may make
  subprocess termination faster.

So we would like to change our code to say:

    sentences.each do |sentence|
       Process.waitpid fork {parse sentence}
    end

(for that last bullet point, we'd also need to edit the link-grammar
bindings, but I won't worry you with the details of that. I haven't
actually implemented it yet.)

That's all, right?
Oy, vey! Testing this, we quickly see that DBI can't put anything in
the database, except for the first sentence. Why? Because when the
child process exits, it closes the database connection, which affects
the parent too. (i.e. DBI isn't fork-safe)

But it turns out DRb is fork-safe, so I create another process of
"middleware" and have that be responsible for the database connection:

    serverpid = fork do
       dbh=DBI.connect(...)
       Signal.trap("INT"){exit}
       acl=ACL.new(%w{deny all allow 127.0.0.1})
       DRb.install_acl(acl)
       DRb.start_service('druby://localhost:9001',dbh)
       DRb.thread.join
    end
    sleep 1 #wait for the server to be setup before continuing
    DRb.start_service
    $dbh=DRbObject.new(nil,'druby://localhost:9001')
    
    ## all of the previous code like before
    ## and then at the end, we kill the DRb server thread:
    
    Process.kill("INT",serverpid)

Now, we have a painless way to make DBI fork-safe.  (Note that DBI
still isn't *thread safe*, and this only works because I'm keeping all
of the child processes serially ordered, but it's not a bad
modification technique to handle this kind of error.)

Footnotes:
[1] http://www.link.cs.cmu.edu/link/
    http://www.abisource.com/downloads/link-grammar
[2] http://bugzilla.abisource.com/show_bug.cgi?id=10391


-- 
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/