Le 17 aoû¹ 06, 10:45, Robert Klemme a ñÄrit : > On 17.08.2006 15:54, Guillaume Marcais wrote: >> I have a script that aggregates data from multiple file, store it all >> in a hash, and then emit a summary on standard input. The input files >> (text files) are fairly big, like 4 of about 50Mb and 4 of about >> 350Mb. The hash will grow to about 500 000 keys. The memory footprint >> of the ruby process as reported by top is above 2 Gigs. >> When the script start, it processes the files at a speed of 10K/s or >> so. Not lightening fast, but will get the job done. As time goes on, >> the speed drops down to 100 bytes/s or less, while still taking 100% >> CPU time. Unbearable. The machine it is running on is pretty good: >> 4xAMD Opteron 64bit, 32G memory, local scsi raided drive. >> Does the performance of Ruby collapse past a certain memory usage? >> Like the GC kicks in all the time. >> Any clue on how to speed this up? Any help appreciated. >> Guillaume. >> The code is as followed: >> delta and snps are IOs. reads is a hash. max is an integer (4 in my >> case). >> It expects a line starting with a '>' on delta. Then it reads some >> information on delta (and discard the rest) and some more information >> on snps (if present). All this is then recorded in the reads hash >> file. >> Each entry entry in the hash are arrays with the 4 best match found >> so far. >> def delta_reorder(delta, snps, reads, max = nil) >> l = delta.gets or return >> snps_a = nil >> loop do >> l =~ /^>(\S+)\s+(\S+)/ or break >> contig_name, read_name = $1, $2 > > Small optimization, which will help only if delta_reorder is called > ofen: > > read = (reads[read_name.freeze] ||= []) > > Background: a Hash will dup a non frozen string to avoid nasty effects > if the original changes. > > <snip/> > > To make people's lives who want to play with this easier you could > provide a complete test set (original script + data files). Will do, when I get to my office. Guillaume. > > I don't fully understand your processing but maybe there's an option > to improve this algorithm wise. > > Kind regards > > robert > >