On Jul 13, 2009, at 21:23, Pito Salas wrote:
> The example I gave was purposely oversimplified to make it easy to
> explain and understand. In reality the records will be far more  
> complex
> and the numbers perhaps in the hundreds of thousands.

Ok, well if you have an average number of bytes per record, you can  
probably guesstimate how much space each record will type.

> But still I do agree with you. I was just trying to see if one of the
> three choices was clearly brain dead or clearly the best one. Would
> using a hash repeat over and over the text of the keys (I assume 'no')

When you use a string (or anything else) as a hash key, Ruby  
uses .hash to figure out a hash key for it.  The access time for  
people["matz"].age should only be slightly slower than matz.age,  
because people["matz"] has to find the hash value of "matz", look up  
the value in the hash, then look up the data, whereas matz.age  only  
has to look up the data.

I don't know the internals of Ruby's struct vs. class implementations,  
but they should be pretty similar.

> or have far slower access? Would using a class that never would have a
> method incur a major performance overhead because accessing each value
> required a method call anyway?

I'm not sure quite what you mean -- a class that never would have a  
method?  Do you mean a class that has no associated instance methods,  
other than the attribute accessors?  If you mean the attribute  
accessor methods, it does add a tiny bit of overhead, but I'm pretty  
sure most of that is implemented in C.  Whether the classes have  
additional methods associated (but never called) should not slow them  
down or add additional space per instance.  If you never call the  
methods they're just additional memory used once in the object  
definition.  In memory, each unique instance of the class should just  
have the data associated with each instance of a class.

If you happen to tack an extra method onto one particular instance of  
a class, you'll end up with unique method data for that one instance.

In other words, somewhere in memory you should have something like:

bob = Employee.new("bob smith", 42)
ang = Employee.new("angela carter", 35)
jj  = Employee.new("jason james", 73)
def jj.retired?
   true
end

Employees:
   initialize: (name, age); @name = var, @age = age
   to_s: "Name: #{name}, Age: #{age}"
   name: name
   name=: name = var
   age: age
   age=: age = var
   ...

Data:
   {employee, bob smith, 42}
   {employee, angela carter, 35}
   {employee, jason james, 73, retired?: true}
   ...

(I'm not completely sure how Ruby does the internals, esp instance  
methods on objects, but it should be something like that).

Overall, all three implementations you talked about should be  
relatively similar in speed and relatively similar in size.  I think  
it's easiest just to throw some sample data at them if you're  
interested in how they perform.  You can even use ObjectSpace to get  
an idea of the count of objects in memory at various points in your  
program.

Ben