On Wed, 22 Nov 2006, Martin DeMello wrote:

> I had a happily-running inotify daemon monitoring file creation and deletion
> that broke when we moved to a multiple machine + NFS setup.  Apparently
> inotify doesn't work over NFS, but it *does* if you have the watcher on the
> same machine as the process creating or deleting the files, which is
> actually good enough for me.

how is this working for you?  i think when i looked into inotify i found that
messages would get sent twice or not at all or out of order or something like
that... i forget this exact issues.  have you seen any?  btw.  could i/we have
a look at your code?

> So here's my current idea - I just wanted to run it by the list since
> I know people here have done related stuff (Ara, I'm looking at you
> :)):
>
> * Have an inotify process per machine that watches for file creation and
> deletion, and sends the messages to a Drb server on a single box.
> * Have the Drb server field messages and put them into an in-memory queue
> * Have a separate thread in the server program that wakes up every 15s
> or so and drains the queue if needed
>
> Any potential pitfalls? Any better way of doing this?

does running the inotify process on the nfs server itself catch all events?
seems like it must?

so your idea is basically to run a watcher on every node that could create
files and coalesce events?  interesting.  seems like you could have some
issues if a node wan't accounted for, eg it's a bit fragile.

another issue might crop up with silly names - when a one node has a file open
and another deletes it and you get those .nfs12345 files - but i'm not sure if
the act of monitoring, via inotify, or stating from all those remote machines
might affect each other...  sillynames provide a consistent view of the file
system so that if

   node b: open's file

   node a: rm's file

   node b: fstat on open file handle (this guy needs a silly name to exist)

so you might make sure that the act of monitoring is not going to creat events
to monitor ;-)  i don't think it will, but nfs is weird...

so, the other idea is using dirwatch, which is what use in our near-real-time
satellite ingest processing system in exactly this way: it watches an nfs
directory and triggers events.  the events themselves are simply jobs
submitted to a queue (ruby queue) which itself works on nfs and all nodes pull
jobs from it.  i use lockfile and/or posixlock to provide nfs safe mutual
exclusion, and the whole system requires zero networking except for nfs.  this
makes it really easy to get by sysads in today's security environment, plus
the whole thing is userland so i really don't need sysad help at all.  a
__big__ perk is that nfs, if mounted hard, simply hangs processes if it goes
away so we can reboot a cluster and all nfs related stuff: dirwatch, rq, and
jobs just hang, even for an extended reboot followed by a 12 hr fsck.  have
you looked at dirwatch?  what kind of events are you triggering?  do they need
to be distributed events or local to the node doing the monitoring?

sorry if this message is a bit all over the place - i'm trying to read to my
kid at the same time!

kind regards.


-a
-- 
my religion is very simple.  my religion is kindness. -- the dalai lama