On Thu, 15 Apr 2004, Hugh Sasse Staff Elec Eng wrote: > Searching the web and books for information on this, I can't seem to > find a definitive yes or no to my question: > > "Is there a portable way to do file locking?" > > Some of the problems that are mentioned are that NFS systems make > almost everything non-atomic, locking methods often depend on fcntl > which is not available on all systems, and the dreaded race > condition when test and set are non-atomic. > > Ruby can be used on the Mac, PC, and Unix, so I'm really after > something that portable. I can't use a Mutex because I need this to > be exclusive across process boundaries (several invokations of the > program). > > My searching suggests this is a common problem, but the answer to it > is rare! > > Thank you > Hugh i been doing alot of experiments with locking myself, mainly on nfs systems for some designs for a distributed work queue i'm working on, and have come to largely the same conclusions. however, you defintely want fcntl based locking for NFS systems. as far as i know any posix compliant sytem has fcntl but i'm a windows dummy (windows people insert correction)... you might want to check out a few things i've done - most of them were done __very__ quickly and further testing is in order but: * c ext to replace File.flock with fcntl based impl http://www.codeforpeople.com/lib/ruby/posixlock/ * a simpler, but less portable?, pure ruby solution provided by matz http://www.codeforpeople.com/lib/ruby/nfslock/ * interface to liblockfile (man 1 lockfile) http://www.codeforpeople.com/lib/ruby/lockfile/ the tests i've been running (day at a time) consist of multiple processes on multiple hosts competing to update a queue in an ordered fashion... if the queue is ever out of order, or a marshall error is thrown, the test 'fails'. i also mark the times each node aquires the lock and gather stats on the min/max/avg time required to obtain the lock. i've run using all three methods above, plus system calls to lockfile, for my locking mechanism and have the following observation * they all work on nfs - i get a core dump every now and again in the liblockfile impl which is almost certainly a bug in my own code * lockd sucks at giving at sort of 'even' distribution to the processes, what i generally see is one node hogging the lock for a while, then eventually lockd seems to realize this and give it another node for a while. for my uses this is not a big deal since the competition in production would not actually be that fierce... it DOES work though with a sufficiently new lockd impl or a rather expensive netap... * the max time between locks for 6 or so process competing for a fcntl based lock on our systems is around 30 seconds * lockfile seems to work really well - given max/min/avg of about 1 sec for all nodes. this really suprised me. * the big drawback to lockfiles is potential hangs and inability to grant read-locks. there is serious locking package on CPAN which claims to do this (read/write nfs safe lockfiles) at http://search.cpan.org/~bbb/File-NFSLock-1.20/lib/File/NFSLock.pm the idea of this seems quite sketchy. i have not tested it. if you are interested in my test code drop me a line - it's one script that you run on all the node, and a monitoring script that goes with it.... nice a terrible like my testing code tends to be... in any case - i would think implementing the algorithim used by liblockfile in ruby might be a good solution. the hard work at making things portable has been done for you by matz and co. i made a stab at that (it's in the lockfile package) but it is NOT finished... i should probably take it out of there... i'm very interested in any findings you have along these lines. please keep us informed. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328 | URL :: http://www.ngdc.noaa.gov/stp/ | TRY :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done ===============================================================================