People,

In response to people's suggestions about speeding up my script by 
replacing output to many small files with output to one large file I 
have implemented a hash table I can write out with YAML.  However, I 
find as the hash table gets larger, the script slows down . . but when I 
try and work out what is happening by producing a small test script that 
does more or less the same thing, I can't reproduce the problem . .

The test script is:


#!/usr/bin/ruby

h1 = Hash.new( 0 )
srand = 0

# seeds = [ '01', '01', '01', '22' ]
# seeds = [ '01', '01', '20', '22' ]
# seeds = [ '01', '32', '20', '22' ]
seeds = [ '50', '32', '20', '22' ]

for a in '01' .. seeds[0]
   start = Time.now
# puts a
   for b in '01' .. seeds[1]
#   puts b
     for c in '01' .. seeds[2]
#     puts c
       for d in '01' .. seeds[3]
#        print "#{d} "
         h1[ "#{a}.#{b}.#{c}.#{d}" ] = Array.new(2){ Array.new(1){ 
Array.new( 20, rand(1) ) } }
       end
#     puts
     end
#   puts
   end
# puts
   stop = Time.now
   puts stop - start
end


The script is faster with the hash insertion commented out of course and 
the time between iterations of the outer loop are constant in both 
scripts - but they are longer and STILL constant with the insertion not 
commented out in the test script.  In my actual script, when the 
insertion is not commented out the time between iterations in the outer 
loop gets longer and longer eg 36 sec -> a few minutes before I kill it 
about half way through . .

Can anyone suggest a way of working out why the production hash 
insertion behaves differently and somewhat unexpectedly?

Thanks,

Phil.
-- 
Philip Rhoades

GPO Box 3411
Sydney NSW	2001
Australia
E-mail:  phil / pricom.com.au