Le 18 ao?t 06, ? 00:07, John Carter a ?crit :

> On Thu, 17 Aug 2006, Robert Klemme wrote:
>
>> Small optimization, which will help only if delta_reorder is called 
>> ofen:
>>
>>    read = (reads[read_name.freeze] ||= [])
>>
>> Background: a Hash will dup a non frozen string to avoid nasty 
>> effects if the original changes.
>
> Stole that for
>   http://rubygarden.org:3000/Ruby/page/show/RubyOptimization
>
>
> Suggestions for the Original Poster...
>
> * Browse that Wiki page, it may have something for you. (Alternatively,
>   once you solve your problem add the solution to that page!)
>

Thanks for the pointer.

I also used Mmap#scan. It is pretty elegant compare to the usual:

io.each { |l|
   l.chomp!
   next unless l =? /blah/
   ...
}

I would think it is faster too (no formal testing done).

> * If you are on Linux, use "vmstat 5" eg.
> vmstat 5
> procs -----------memory---------- ---swap-- -----io---- --system-- 
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us 
> sy id wa
>  0  0 127628  14308   4548 198096    1    1    28     7   27    20  8  
> 1 86  5
>  0  0 127628  14308   4548 198096    0    0     0     0  421   835  0  
> 0 100  0
>
>
> Watch the "si" and "so". (Swap In Swap Out) If you are swapping 2 or
> more swaps every 5 seconds, then you don't have ruby GC problems, you
> have memory problems. ie. Tweaking GC won't help. You have to store 
> less
> in ram full stop. Remember to set any dangling references that you 
> won't
> use again to nil, especially from class variables and globals.

Checked that. But no, the machine has plenty of memory and si/so was 
always 0.

I finally went back to doing stream parsing. Instead of aggregating the 
information from many file in one big hash, I read and write one record 
at a time in a format suitable for sort (the UNIX command). Then pipe 
it to sort. Finally, I merge all relevant files together with 'sort 
-m'. This lead to the result in a few hours only.

Thank you all for your suggestions,
Guillaume.

>
>
>
> John Carter                             Phone : (64)(3) 358 6639
> Tait Electronics                        Fax   : (64)(3) 359 4632
> PO Box 1645 Christchurch                Email : john.carter / tait.co.nz
> New Zealand
>
> Carter's Clarification of Murphy's Law.
>
> "Things only ever go right so that they may go more spectacularly 
> wrong later."
>
> From this principle, all of life and physics may be deduced.
>
>