Geert Fannes wrote:

>I implemented a sparse matrix class for loading a calling network (vertices are mobile phone users and edges are calls. The rows are held by a Hash and each row is an AVLTree, containing the information to where some customer called and how much (total duration). These AVLTrees are balanced trees that allow fast lookup and insertion, I used the C package "C AVL Tree generic package" from Walt Karas for this and converted it to a ruby class.
>
>Anyway, as you can imagine, these calling networks are quite large (5 gig in memory, 10 million customers, 50 million calls). Unfortunately, ruby's GC hinders fast loading of these graphs: each time GC shoots in action, he has to check the ever increasing AVLTree objects for deletion and none can be deleted. Therefore, my plan was to disable GC and run it manually every 2 million lines. When doing this, ruby appears to consume way more memory and crashes before the calling graph is loaded from file (Errno::ENOMEM). When I leave the default GC untouched, I am capable to read the graph, but much slower ofcourse due to frequent GC. I cannot say GC.start is not working, since after each 2 million lines, the program keeps the same memory footprint for some time, before it starts increasing again. My guess is that it does not really free the memory, but keeps it in some pool and reuses it. 
>
You can try my GC patch, which allows one to specify the initial heap 
size and several other parameters. You can download it from 
http://rubyforge.org/projects/railsbench.

Regards,

Stefan Kaes