On Fri, 17 Sep 2004, Markus wrote:

> Ara --
>
>    Random thoughts:
>

>      * It could be a race condition of some sort

yes - perhaps even in some library code i'm exercising - this my current best
guess.

>      * It could be that closing the file in the child closes it for the
>        parent even though closing it for the parent does not close it
>        for the child

hmmm - not that one:

harp:~ > ruby -e'f = open "f","w";fork{ f.close };Process.wait;f.puts 42'
harp:~ > cat f
42


>      * It could be that you omitted a file from your keep list that the
>        child actually needs.  It tries to access it, goes boom,...

i do an exec of bash immediately after so i think that's out since bash cannot
possibly  require anything ruby or sqlite has open other that stdin, stdout,
and stderr.

>      * can you make it happen in a simplified situation (e.g. one
>        child, etc.)

yes.  but not predictably either.  it can run for days, or minutes.
unfortunately (for debugging) it usually about 3 days before a core dump -
diffucult to work with...

>      * is it possible to make nfs put the ugly files somewhere you
>        can't see them?  I know much of the software I run has lots of
>        ugly files (e.g. the web browser cache), but they don't bother
>        me because I don't look at them.

i handle that this way now:

     def sillyclean dir = @dirname
#{{{
       glob = File.join dir,'.nfs*'
       orgsilly = Dir[glob]
       yield
       newsilly = Dir[glob]
       silly = newsilly - orgsilly
       silly.each{|path| FileUtils::rm_rf path}
#}}}
     end

this code wraps ONLY the transaction/fork code.  it is safe because i know any
silly file left over from a transaction was created due to the sqlite not
setting close-on-exec on it's tmp files.  plus removing a silly file cannot
hurt because they spring back into existence (by definition) if someone
actually still needs them.  so, if the remove succeeds it no-one was actually
using them.  this is indeed what happens - they are removed never to return.
i just hate this sort of thing.


>      * Instead of specifying the files you want to keep (STDIN, etc)
>        could you list the ones you want to close, and narrow the
>        problem down that way?

yes - i'm working on that.  the problem is that i actually KNOW the filename
that gets unlinked and causes the sillyname - it's the 'db-journal' file (i
can see a .nfsXXXX file come into existence with it's exact contents).  the
problem is that the sqlite api opens this file and i have no file handle on
it.  problem two is that ruby does not provide a way to get at this info that
i know of.  you could

   256.times do |fd|
     begin
       file = IO::new fd
       File::unlink file.path if file.path =~ %r/db-journal/o
     rescue Errno::EBADF, Errno::EINVAL
     end
   end

__except__ that File objects created this way do not have a path!  (nor
respond_to?('path') for that matter) - at least on my ruby.  i'm not sure if
this is a bug or not...

>    I don't know if any of these will help, but I can't see that they
> could hurt (I used to say that "ideas can't hurt you" but I'm older
> now).

funny.  yeah - anything helps - i'm grasping at straws!

cheers.

-a
--
===============================================================================
| EMAIL   :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE   :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it. 
|   --Dogen
===============================================================================