Hi~

On Feb 16, 2007, at 3:08 AM, Eleanor McHugh wrote:

> On 15 Feb 2007, at 15:25, Eivind wrote:
>>      def fetch(fname)
>>        File.open(fname, 'r') do |fp|
>>          while buf = fp.read(4096)
>>            yield(buf)
>>          end
>>        end
>>        return nil
>>      end
>>
>>
>>      def store_from(fname, there)
>>        puts
>>        size = there.size(fname)
>>        wrote = 0
>>
>>        File.rename(fname, fname + '.bak') if File.exists? fname
>>        File.open(fname, 'w') do |fp|
>>          yield([wrote, size]) if block_given?
>>          there.fetch(fname) do |buf|
>>            wrote += fp.write(buf)
>>            yield([wrote, size]) if block_given?
>>            nil
>>          end
>>          fp.close
>>        end
>>
>>        return wrote
>>      end
>
> Your slowdown is an artefact of breaking the file read and transmit  
> operations down into chunks of 4096 bytes. This will cause your  
> 600kb word document to be sent as 150 discrete messages across the  
> network, each time incurring the cost of a disk seek and probably  
> the cost of network congestion. The fact that you're running both  
> pieces of code on the same machine will also add 150 additional  
> disk seeks into the equation for the write process. These all incur  
> non-deterministic costs based on the actual layout of the file  
> system, task switching by the OS between disk operations,  
> particular OSs disk caching mechanisms, etc.
>
> If you read the entire file into memory in one chunk that will  
> reduce the cost at one end, then by buffering the whole thing in  
> memory at the other end until the transfer is complete you'll  
> reduce the other cost. As you are probably transmitting over TCP I  
> also wouldn't bother to break the file up into discrete chunks as  
> the underlying transport will take care of that for you (and 4096  
> is very rarely an optimal block size: for ethernet traffic try  
> somewhere around 1536, and for disk access it'll depend on the  
> settings for the file-system and the physical geometry of the disk).
>
> As a general rule of thumb, always seek to minimise the number of I/ 
> O operations that your code is performing if you want to avoid  
> these kinds of problems. I/O is orders of magnitude slower than  
> anything else.
>
> Ellie


	Sending a file across drb like that is also incurring the cost of  
Marshalling and unmarshaling the file. I would think you would be  
better off having one of the drb processes use net/sftp to transfer  
the file to the other node and then send a drb message with the file  
path.

Cheers-
-- Ezra Zygmuntowicz 
-- Lead Rails Evangelist
-- ez / engineyard.com
-- Engine Yard, Serious Rails Hosting
-- (866) 518-YARD (9273)