On Fri, 2008-08-08 at 20:40 +0900, tobyclemson / gmail.com wrote: > Hi all, > > I'm having a really odd memory problem with a small ruby program I've > written. It basically takes in lines from input files (which represent > router flows), deduplicates them (based on elements of the line) and > outputs the unique flows to file. The input file often contains over > 300,000 lines of which about 25-30% are duplicates. The trouble I'm > having is that the program (which is intended to be long running) does > not seem to release any memory back to the system and in fact just > increases in memory footprint from iteration to iteration. It should > use about 150 MB by my estimates but sails through this and yesterday > slowed to a halt at about 1.6GB (due to the GC by my guess). This > makes no sense as at times I am deleting data structures that are 50MB > each which should show some decrease in memory usage. > > The codebase is slightly to big too big to pastie but it is available > here http://svn.tobyclemson.co.uk/public/trunk/flow_deduplicator . > There are actually only 2 classes of importance and 1 script but I > don't know if pastie can handle that. > > Any help would be greatly appreciated as the alternative (pressures > from above) is to rewrite in Python (which involves me learning > Python) > > Thanks in advance, > Toby Clemson Are you on a platform that has GNU "sort" available? GNU "sort" can do the duplicate removal for you a *lot* more efficiently than a program in *any* scripting language. Then you can use Ruby to do the "interesting" part of the problem. :) > -- M. Edward (Ed) Borasky ruby-perspectives.blogspot.com "A mathematician is a machine for turning coffee into theorems." -- Alfr¥Æ¥¥d R¥Æ¥¥nyi via Paul Erd¥ÊÁÔ