On Thu, 15 Apr 2004, Hugh Sasse Staff Elec Eng wrote:

> Searching the web and books for information on this, I can't seem to
> find a definitive yes or no to my question:
> 
> "Is there a portable way to do file locking?"
> 
> Some of the problems that are mentioned are that NFS systems make
> almost everything non-atomic, locking methods often depend on fcntl
> which is not available on all systems, and the dreaded race
> condition when test and set are non-atomic.
> 
> Ruby can be used on the Mac, PC, and Unix, so I'm really after
> something that portable.  I can't use a Mutex because I need this to
> be exclusive across process boundaries (several invokations of the
> program).
> 
> My searching suggests this is a common problem, but the answer to it
> is rare!
> 
>         Thank you
>         Hugh

i been doing alot of experiments with locking myself, mainly on nfs systems
for some designs for a distributed work queue i'm working on, and have come to
largely the same conclusions.  however, you defintely want fcntl based locking
for NFS systems.  as far as i know any posix compliant sytem has fcntl but i'm
a windows dummy (windows people insert correction)...

you might want to check out a few things i've done - most of them were done
__very__ quickly and further testing is in order but:

  * c ext to replace File.flock with fcntl based impl

      http://www.codeforpeople.com/lib/ruby/posixlock/

  * a simpler, but less portable?, pure ruby solution provided by matz

      http://www.codeforpeople.com/lib/ruby/nfslock/

  * interface to liblockfile (man 1 lockfile)

      http://www.codeforpeople.com/lib/ruby/lockfile/


the tests i've been running (day at a time) consist of multiple processes on
multiple hosts competing to update a queue in an ordered fashion... if the
queue is ever out of order, or a marshall error is thrown, the test 'fails'.
i also mark the times each node aquires the lock and gather stats on the
min/max/avg time required to obtain the lock.  i've run using all three
methods above, plus system calls to lockfile, for my locking mechanism and
have the following observation

  * they all work on nfs - i get a core dump every now and again in the
    liblockfile impl which is almost certainly a bug in my own code

  * lockd sucks at giving at sort of 'even' distribution to the processes,
    what i generally see is one node hogging the lock for a while, then
    eventually lockd seems to realize this and give it another node for a
    while.  for my uses this is not a big deal since the competition in
    production would not actually be that fierce...  it DOES work though with
    a sufficiently new lockd impl or a rather expensive netap...

  * the max time between locks for 6 or so process competing for a fcntl based
    lock on our systems is around 30 seconds

  * lockfile seems to work really well - given max/min/avg of about 1 sec for
    all nodes.  this really suprised me.

  * the big drawback to lockfiles is potential hangs and inability to grant
    read-locks.  there is serious locking package on CPAN which claims to do
    this (read/write nfs safe lockfiles) at

      http://search.cpan.org/~bbb/File-NFSLock-1.20/lib/File/NFSLock.pm

    the idea of this seems quite sketchy.  i have not tested it.


if you are interested in my test code drop me a line - it's one script that
you run on all the node, and a monitoring script that goes with it.... nice
a terrible like my testing code tends to be...

in any case - i would think implementing the algorithim used by liblockfile in
ruby might be a good solution.  the hard work at making things portable has
been done for you by matz and co.  i made a stab at that (it's in the lockfile
package) but it is NOT finished... i should probably take it out of there...

i'm very interested in any findings you have along these lines.  please keep
us informed.

-a
-- 
===============================================================================
| EMAIL   :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE   :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL     :: http://www.ngdc.noaa.gov/stp/
| TRY     :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done 
===============================================================================