On Sun, 26 Jan 2003, Joel VanderWerf wrote: > Any solaris gurus out there? any chance your home directory, or where ever you are running, is nfs mounted? -a > > I'm having trouble porting some multi-thread, multi-process code from > linux to solaris. I've already dealt with (or tried to deal with) some > differences in flock (solaris flock is based on fcntl locks), like the > fact that closing a file releases locks on the file held by other threads. > > I've managed to isolate the problem in a fairly simple test program. It's at > > http://path.berkeley.edu/~vjoel/ruby/solaris-bug.rb > > The program creates /tmp/test-file-lock.dat, which holds a marshalled > fixnum starting at 0. Then it creates Np processes each with Nt threads > which do a random sequence of reads and writes using some locking > methods. The writes just increment the counter. > > When a process is done, it writes the number of times it incremented the > counter to the file /tmp/test-file-lock.dat#{pid}. Then the main process > adds these up and compares with the contents of the counter file. The > point of this is to test for colliding writers. > > But the program fails before that final test--it seems to be having a > collision between a reader and a writer that causes the reader to see a > corrupt file. > > A typical run fails like this. The counter 0..3 is a seconds clock: > > $ ruby solaris-bug.rb > 0 > 1 > 2 > 3 > solaris-bug.rb:128:in `load': marshal data too short (ArgumentError) > > It looks like there are a reader and a writer accessing the file at the > same time, and the writer has just truncated the file (line 137) when > the reader tries to read it. > > This happens: > > - on solaris, quad cpu > - ruby 1.7.3 (2002-10-30) [sparc-solaris2.7] > > - *not* on single processor linux > - ruby 1.7.3 (2002-12-12) [i686-linux] > > - *not* on dual SMP linux > - ruby 1.6.7 (2002-03-01) [i686-linux] > > Also, the bug requires *both* of: > > - thread_count >= 2 > > - process_count >= 2 > > Also, the bug requires that there be both reader and writer operations > (i.e., that the random number lead to each branch often enough, say 50/50). > > > -- ==================================== | Ara Howard | NOAA Forecast Systems Laboratory | Information and Technology Services | Data Systems Group | R/FST 325 Broadway | Boulder, CO 80305-3328 | Email: ahoward / fsl.noaa.gov | Phone: 303-497-7238 | Fax: 303-497-7259 ====================================