Thanks Robert,


Your explanation fits more or less exactly what I was fearing.

I think it would be really nice for a understanding/debugging point of 
view if I could get this working in a single thread (parallel can be 
disable by setting in_processes: 0). However, there are several issues.

First: with a simple pipe: p.add(:cat, input: "foo.tab").add(:dump) 
where "foo.tab" only contains a few lines the script is hanging after 
:dump and I suspect it is because of lack of a "termination signal" like 
EOF.

Second, if "foo.tab" contains more than a couple of thousand lines then 
it blocks in the :cat step - and I suspect that the reader buffer is 
full and requires unloading to resolve the block.

Third, I was hoping that with multiple processes the next process would 
unload the buffer from the preceding step in a timely fashion. However, 
it is clear that this isn't the case - some synchronization is require 
between the processes. If only Ruby (MRI) threads supported multiple 
processors it could be handled with a mutex (darn GVL!). One could dig 
into EventMachine to see if that would work, but I am scared by the 
complexity of if.

Thus, I fear my design is flawed. One thing I can think of is to change 
the design slightly so a single record at a time is passed from command 
to command.


Cheers,


Martin

-- 
Posted via http://www.ruby-forum.com/.