Justin Johnson wrote:
>> 
>>>       @pool << Thread.current
>> 
>> The above line should be within the synchronize (above), or it might 
>> spawn more than @max_size threads (I think).
>> Replace with
>>       @pool_mutex.synchronize do
>>         while @pool.size >= @max_size
>>           print "Pool is full; waiting to run #{args.join(',')}...\n" if 
>> $DEBUG
>>           # Sleep until some other thread calls @pool_cv.signal.
>>           @pool_cv.wait(@pool_mutex)
>>         end
>>         @pool << Thread.current
>>       end
> 
> I tried your suggestion but still get the error, and only when using 
> backticks or any other method that reads from stdout or stderr.
> 
>>>       begin
>>>         yield(*args)
>>>       rescue => e
>>>         exception(self, e, *args)
>>>       ensure
>>>         @pool_mutex.synchronize do
>> 
>> In reality you don't need to synchronize here, as the @pool size/add to 
>> is synchronized elsewhere.  Taking this synch out may introduce new 
>> problems, but...I'm not sure what they are :)


Because you're removing (and only once, and from a hash), I believe it's 
thread safe to remove at any time (though, in retrospect, it might cause 
a teeny bit of redundancy, but no problems)--i.e. taking out the 
synchronize here

> 
> I'm not sure what you mean.  This makes sense to me, as we do not want 
> more than one thread removing itself from the list at a time.
> 
>> My question is if the wait within the shutdown function will 'suck' 
>> signals away from processes, decreasing the number that can go at a 
>> time, if all are at the 'wait' phase.  Perhaps rethinking this code 
>> would be nice :)

My hypothesis is that the shutdown function somehow or other messes up 
the way the pool works.  Maybe/maybe not.

Another problem with shutdown is that is seems like shutdown could be 
called early enough to 'disallow' certain threads from entering the pool 
(those that have not yet started to wait on the condition variable).  So 
that is another problem with it.


> 
> I'm not sure what you mean by "suck signals away from processes".  The 
> shutdown method is just waiting until all threads have ended so we don't 
> end our program before the threads are done.
> 
> It still seems to me that stdout and stderr have something to do with my 
> problem, since it always occurs when using backticks or even 
> win32/popen3 methods that read from stdout and stderr, but never with 
> system.  Anyone else have any ideas on this?

I was able to recreate the bug using system OR backticks, as a note.  I 
think that the difference is only in timing, not in functionality (one 
takes longer, so aggravates the problem more, perhaps?)

In reality, though, I think this might (might) be a real bug in Ruby, or 
maybe I misunderstand the functionality of signal.

My guess is still that shutdown is messing it up somehow

http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/818d88a5eae23820
 discusses some type of IO binding problem (but not a concurrency one).

I do also notice some 'weird lines'
like
Job 184 stopped.??кт??File Not Found
Which might be indicative of the problem you described.

What I dislike about this bug is that sometimes it shows itself and 
sometimes it doesn't, so it's hard to know if you've actually overcome 
it or not!

Another thought is that it might just be Ruby's threads 'misassigning' 
variables or what not.

Another idea would be to have every thread that adds itself to the pool 
'signal' immediately after, to allow some other waiting thread to 
'check' if the pool has decreased.  That would be a kind of bandaid hack 
but heck maybe it would work :)


In my opinion it needs a different condition variable like "pool empty" 
which is signaled by each thread on exit, for the shutdown method to 
wait on.


Another concern is that it seems to me that there may be a possibility 
that shutdown will end too early, in extreme cases where two threads 
remove themselves simultaneously from the pool.  Not our problem, but 
hey, it might be.

From the interthreaded output it appears that Ruby does indeed have some 
output problems.  Not sure.  Another opinion is that maybe it's ruby's 
deadlock detection going amuck--you'll notice in teh original post two 
threads on line 28--these two are at a synchronize point.  You'd think 
that this could NOT deadlock, since those two should be able to always 
continue, unless there's a loop of some sort within a synchronize.
you might try rewriting dispatch to be something like

  def dispatch(*args)
    Thread.new do
      # Wait for space in the pool.
      @pool_mutex.synchronize do
  if @pool.size > @max_size
      print "WHAT GREATER?"
  end
  if @pool.size == @max_size
          print "Pool is full; waiting to run #{args.join(',')}...\n"
          # Sleep until some other thread calls @pool_cv.signal.
          @pool_cv.wait(@pool_mutex) # receiving a signal should ALWAYS 
mean there's space now available
          @pool << Thread.current
  end
        if @pool.size > @max_size
    print "huh? it has GOTTA be les than or equal here!"
  end
  end
      end

Then there are no loops so you ensure that, if there is a deadlock, and 
some threads are waiting on a synchronize, then it's the deadlock 
protection's fault [it thinks the mutex is deadlocked, but we don't 
think it is].  This is very very possibly the problem.

As per the post
"Ditto. AFAIK all external IO is blocking in Ruby on Windows..."

If this is the case you might try using a synchronize around the system 
command, itself, to avoid concurrency on the IO.  Maybe all running 
processes are 'blocked' on IO [though I've seen interleaved results on 
the screen, so they don't seem to block on write] so deadlock thinks 
they're frozen?  (except---you have processes stuck on synchronize, and 
the IO command runs from not within a synchronize block, so you wouldn't 
think blocking IO would be a synchronize problem).  Maybe it is. Sigh :) 
I have written programs with hundreds of threads that output to the 
screen and (sure maybe they block when the write happens, perhaps, 
but...) never had a deadlock issue from it (well, then again I didn't 
try to synchronize it, either).

If it is a Ruby IO problem, then maybe two of the processes are 'getting 
stuck' in IO and never ending--you could put in some debug info to test 
out that hypothesis.  Might not be the real problem, though, seeing the 
two threads on line 28.

It seems (in my opinion--haven't checked it out too closely yet) that 
the problem is only caused when the shutdown method is 'in the mix'--you 
might try rewriting it to create an array of all threads trying to enter 
the pool [i.e. array of size 200] and then joining on each element of 
the array
while !allThreadArray.empty?
  nextThread = allThreadArray.shift
  nextThread.join
end

that type of thing.

For my runs I used windows.  Haven't tried it in Linux at all.

In short I don't know.  Lots of hypotheses :)
Good luck!
-Roger

--I like it :) http://www.google.com/search?q=free+bible
-- 
Posted via http://www.ruby-forum.com/.