Wincent Colaiuta wrote:
> I'm trying to work out ways to reduce the memory use of one of my
> projects, but I don't know what methods are available to the Ruby
> programmer for profiling memory use and tracking down garbage
> collection problems.
>
> The short version:
>
> I have a project where processing a file can consume dozens of
> megabytes of memory; if I process many files in a single run then
> total memory usage can reach hundreds of megabytes or more than a gig.
> I would expect garbage collection to kick in along the way but it
> doesn't seem to be happening, memory usage grows and grows, and I
> don't know where to start zeroing in on the problem.
>
> The long version:
>
> I've written an object-oriented templating system[1] that incorporates
> a memoizing packrat parser. As each file is parsed the parser
> "memoizes" the partial results for speed. In a lengthy file the size
> of the memoizing cache can grow quite large (dozens of megabytes). But
> I would expect the entire contents of the memoization cache to get
> garbage collected when I move on to the next file; the cache itself
> has definitely fallen out of scope by that time. But garbage
> collection doesn't seem to be happen, as memory use grows linearly as
> I batch process input files.
>
> As this is a largish, complicated project I don't even know where to
> begin to start investigating this. So really, I am looking for general
> information on techniques for measuring and exploring memory use and
> garbage collection in Ruby.
>
> Thanks in advance for the advice!
> Wincent
>
> [1] http://walrus.wincent.com/
>   
Well ... where to begin?? :)

1. First of all, get the notion that "premature optimization is the root
of all evil" out of your head. The only sense in which that maxim is
valid is when the word "premature" is strongly emphasized. Part of the
practice of software engineering, and what separates software
engineering from "mere coding" is knowing what the algorithms of choice
are for the problem you are trying to solve -- and their resource
requirements -- and using them. Dijkstra may have said the premature
optimization thing, but it's obviously been taken out of the context of
his *massive* output of practical computer science and software
engineering teachings. Read *everything* he wrote!

2. In general, to reduce memory usage, you must do one or both of two
things: recompute things rather than storing them in memory, or write
things explicitly out to "backing store" and read them back in.

3. Languages without explicit object destructors need to be fixed,
including Ruby. :) However, part of software engineering in the absence
of them is to make sure there are no references to objects you no longer
want, and then explicitly call the garbage collector. I do a lot of
coding in R, which is a dynamic, garbage collected language for
scientific and statistical computing. I've got a 1 GB workstation, and
still I have "normal sized problems" that can overflow memory. A simple
delete of unused objects (R has "rm", which will delete an object from
the workspace) followed by a call to the garbage collector usually gets
me going again.

4. Relational databases are your friend. They are designed and optimized
for dealing with large and complicated datasets, and object-relational
mappings like ActiveRecord and Og (Object Graph) exist in Ruby to make
working with them as simple as possible. How do you define "large"? For
a single-user system like a laptop or workstation, figure you have
something like half of the installed RAM to run your applications. At
least on Linux workstations, things like I/O buffers will take up the
other half. If you're only running this one application, anything bigger
than half of your installed RAM is too big and ought to be redesigned to
use a database.

-- 
M. Edward (Ed) Borasky, FBG, AB, PTA, PGS, MS, MNLP, NST, ACMC(P)
http://borasky-research.net/

If God had meant for carrots to be eaten cooked, He would have given rabbits fire.