Wincent Colaiuta wrote: > I'm trying to work out ways to reduce the memory use of one of my > projects, but I don't know what methods are available to the Ruby > programmer for profiling memory use and tracking down garbage > collection problems. > > The short version: > > I have a project where processing a file can consume dozens of > megabytes of memory; if I process many files in a single run then > total memory usage can reach hundreds of megabytes or more than a gig. > I would expect garbage collection to kick in along the way but it > doesn't seem to be happening, memory usage grows and grows, and I > don't know where to start zeroing in on the problem. > > The long version: > > I've written an object-oriented templating system[1] that incorporates > a memoizing packrat parser. As each file is parsed the parser > "memoizes" the partial results for speed. In a lengthy file the size > of the memoizing cache can grow quite large (dozens of megabytes). But > I would expect the entire contents of the memoization cache to get > garbage collected when I move on to the next file; the cache itself > has definitely fallen out of scope by that time. But garbage > collection doesn't seem to be happen, as memory use grows linearly as > I batch process input files. > > As this is a largish, complicated project I don't even know where to > begin to start investigating this. So really, I am looking for general > information on techniques for measuring and exploring memory use and > garbage collection in Ruby. > > Thanks in advance for the advice! > Wincent > > [1] http://walrus.wincent.com/ > Well ... where to begin?? :) 1. First of all, get the notion that "premature optimization is the root of all evil" out of your head. The only sense in which that maxim is valid is when the word "premature" is strongly emphasized. Part of the practice of software engineering, and what separates software engineering from "mere coding" is knowing what the algorithms of choice are for the problem you are trying to solve -- and their resource requirements -- and using them. Dijkstra may have said the premature optimization thing, but it's obviously been taken out of the context of his *massive* output of practical computer science and software engineering teachings. Read *everything* he wrote! 2. In general, to reduce memory usage, you must do one or both of two things: recompute things rather than storing them in memory, or write things explicitly out to "backing store" and read them back in. 3. Languages without explicit object destructors need to be fixed, including Ruby. :) However, part of software engineering in the absence of them is to make sure there are no references to objects you no longer want, and then explicitly call the garbage collector. I do a lot of coding in R, which is a dynamic, garbage collected language for scientific and statistical computing. I've got a 1 GB workstation, and still I have "normal sized problems" that can overflow memory. A simple delete of unused objects (R has "rm", which will delete an object from the workspace) followed by a call to the garbage collector usually gets me going again. 4. Relational databases are your friend. They are designed and optimized for dealing with large and complicated datasets, and object-relational mappings like ActiveRecord and Og (Object Graph) exist in Ruby to make working with them as simple as possible. How do you define "large"? For a single-user system like a laptop or workstation, figure you have something like half of the installed RAM to run your applications. At least on Linux workstations, things like I/O buffers will take up the other half. If you're only running this one application, anything bigger than half of your installed RAM is too big and ought to be redesigned to use a database. -- M. Edward (Ed) Borasky, FBG, AB, PTA, PGS, MS, MNLP, NST, ACMC(P) http://borasky-research.net/ If God had meant for carrots to be eaten cooked, He would have given rabbits fire.