On 02/08/2010 10:18 PM, Tim Smith wrote:
> I've got a table that consists of about 2.6 millions rows, each row 
> containing 5 columns (a short string, 3 integers, and a decimal 
> fraction).
> 
> A program needs to read the whole table. At first, programming without 
> brain fully engaged, I just grabbed it in Perl using the DBI function 
> selectall_arrayref, and then looped through the results.
> 
> That uses about 1.3 gig of memory. Perl data structures are not 
> efficient when it comes to memory. This approach is reading the whole 
> table (which is a mere 130 MB according to MySQL) into a Perl data 
> structure (array of rows, each row an array of columns), which I then 
> scan to build my data structures.
> 
> Recoding to not be so brain dead, instead using prepare and execute, 
> cuts the memory usage to about 270 MB. That makes sense, as the table is 
> now being loaded into some data structure maintained by the C library 
> that implements the Perl DBI driver for mysql.
> 
> Out of curiosity, I decided to try this in other languages. Python is 
> the same as Perl. 270 MB when I execute the store_result() function to 
> grab the data from the database.
> 
> Same in C. 270 MB used when I grab the results with mysql_store_result().
> 
> This all makes sense, assuming the Perl and Python database drivers are 
> using the C mysql library.
> 
> So then I tried Ruby:
> 
>    require "mysql"
>    my = Mysql.new(...)
>    st = my.prepare("SELECT * from my_table")
>    st.execute
>    while row = st.fetch do
>       ...
>    end
> 
> That only uses about 130 MB of memory!?
> 
> The above was on a 64-bit machine. On a 32-bit machine, the numbers are 
> 185 MB for C, Perl, and Python, and 103 MB for Ruby.
> 
> How is Ruby able to do so much better on memory usage?
> 
> I did not test speed, but I think the Ruby version also ran a little 
> faster than the others.

It appears that you employ two fundamentally different approaches: the 
description of your Perl and Python versions sounds as if you load the 
*whole table* in one data structure and only then iterate or process it. 
  The Ruby code fetches a single row at a time (or at most a few rows if 
there is caching going on behind the scenes).  Well, if my assessment is 
correct, you have your reason. :-)

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/