On Wed, 25 Aug 2004, Martin DeMello wrote: > Ara.T.Howard / noaa.gov wrote: >> i have an application which does alot of byte range locking on an nfs >> mounted file. twice now, i've seen a 'dead' lock appear on our nfs >> server - a lock held by a non-existent pid. none of the processes in >> question have been dying > > When you figure it out, could you post a followup here? Sounds like > something it'd be useful to know about. > > martin it's figured out. i have it coded in a nice and ugly fashion and have been testing it for the last 24 hours with a bit of code that 'breaks' to lock every few seconds and forces recovery in the clients. note that the entire thing is an emergency only procedure that takes place ONLY when the system is already broken (hung locks) and so the solution doesn't have to be 'perfect'. i'm not talking about race conditions, just that it's difficult for one system to recover the locking atomically in the presence of broken locks (for obvious reasons) and, therefore, to ensure not killing some remote client. in my case this is ideal, my remote clients are remote imorrtal daemons that will restart if killed and, in fact, i want this to happen. what i don't want to happen is for my entire system to hang. i think it's acceptable to say that - iff your nfs locking implementation breaks my code may too, however it recovers afterwards automatically. so, in that context - here is the basic solution here is the basic algorithim - an empty monitor file will be used in conjunction with the file in question. the reason we need an extra file is so we can make guaruntees about the way in which it is updated (mtime). this file is kept in a directory along side the file in question. this makes recovery much easier (atomic). - apply your lock type (write/read) to the monitor file in non-blocking fashion. if lock succeeded procede to apply the same lock type, also in non-blocking fashion, to the file in question. if the lock on the file in question succeeds then start a thread which will loop touching the monitor file to keep it 'fresh' - say every ten seconds. and procede to use the file. note that it's critical to use non-blocking locks since blocking locks (in ruby) will stop this thread! if you must use blocking locks then a process must be forked to keep the monitor file fresh and you cannot use a thread. ensure killing this thread/process if the lock on the file in question fails raise an error - the protocol has failed for some reason. if lock failed if the monitor file is fresh simply sleep a bit and retry. this will be the normal execution path if lockd is working. else if the monitor file is stale one of two things must be true: - a process holding the lock has died and the nfs client/server have not managed to clean things up (hung lock - lockd bug) - a process holding the lock is partitioned on a slow network, running on a frozen cpu, or somehow has the lock but cannot keep the monitor file fresh. there is a chance that recovery may kill this process - but it is already sick and hanging the system. in either case we attempt lockd recovery taking the risk of killing a sick remote client. note that MANY remote clients might realize the hung situation at once, and so they themselves need to serialize recovery without using fctnl locks since those locks may be hung! lockd recovery involves: - mark recovery start time - create an nfs safe lockfile (my lockfile lib), this is a 'blocking' (poll/sleep) operation to ensure only one remote process is attempting recovery at a time. - see if someone else has already recovered (flag file exists with timestamp greater than recovery start time). iff so quit recovery and go back to attempting to get the first lock - since we have the lockfile and noone else has recovered: # # prevent new process from using either file # mv directory containing files to dir.bak # # clear lockd locks - (force new inode info) # for files in monitor, file in question cp file tmp rm file mv tmp file end # # mark recovery time # touch directory/lockd_recovery # # restore system # mv dir.bak directory - retry to aquire lock in my case all remote clients are prepared to get a suite of errors during transactions such as Errno::ENOENT, Errno::ESTALE, etc. the allow quite a few of these by sleeping and retrying but, eventually give up and die. the retry is o.k. because all access to the file must be in a transaction (so by definition is o.k. to re-execute on failure) and even if many retries are made and the transaction is lost - the daemon will exit and restart (logging this info). this last situation would be bad - but not compared to having the entire system freeze - again, this is ONLY an emergency effort. regards. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | A flower falls, even though we love it; | and a weed grows, even though we do not love it. | --Dogen ===============================================================================