Brian Candler wrote:
> Charles Hixson wrote in post #1084133:
>> Hashes are slow compared to direct indexing.
> Says who?
This is a commonly true statement for a wide variety of languages.  I am 
quite likely to be translating this work into another language.  In the 
other language I am certain that it is a true statement.  Unless Array 
access is significantly slower than Hash access (which I cannot believe) 
this is the way to proceed.
>
>> Push might be a good
>> answer, but not if it allocates storage item by item.
> It doesn't. Look at the MRI C implementation if you really care: the
> underlying array storage size is increased in chunks. Ditto for Strings.
That's ok.  push was the wrong answer anyway.  The right answer was that 
Arrays automatically extend to encompass the largest index that you use 
to store into them.
>
> void
> rb_ary_store(ary, idx, val)
> ...
>      if (idx>= RARRAY(ary)->aux.capa) {
>          long new_capa = RARRAY(ary)->aux.capa / 2;
>
>          if (new_capa<  ARY_DEFAULT_SIZE) {
>              new_capa = ARY_DEFAULT_SIZE;
>          }
>          if (new_capa>= ARY_MAX_SIZE - idx) {
>              new_capa = (ARY_MAX_SIZE - idx) / 2;
>          }
>          new_capa += idx;
>          REALLOC_N(RARRAY(ary)->ptr, VALUE, new_capa);
>          RARRAY(ary)->aux.capa = new_capa;
>      }
>
> [from ruby-1.8.7-p352/array.c]
Sorry, my C isn't that good, and there are a lot of undefined terms in 
that snippet.  I can't really say I understand it.  I can usually guess 
what's going on, but undefined macros make me quite unsure...and I try 
to avoid pointers when writing code, because I want to understand it 
later.  For that matter, I usually avoid macros, too.  I realize that 
this is a common C coding style, but it's one reason I dislike C.  I'd 
prefer D, FORTRAN, or even Ada or Eiffel.  (Yaa...the code is what it 
is.  But to me what it is is confusing.)
>
>> Allocation of memory is another expensive operation
> Says who? Expensive compared to all the other things going on in your
> program? Your program will probably be creating objects
> left-right-and-centre, and having them garbage collected too.
Among system calls, allocation of memory is one of the more expensive.  
OTOH, the comment earlier about Array automatically extending itself 
answers this as there are
steps automatically taken to minimize the number of allocations of 
memory.  I don't think that using push repeatedly is a good idea (and in 
any case, it doesn't fit with my logic, unless there is something about 
the system that demands that it be done that way, which there isn't).
>
> I think you are approaching this the wrong way. I suggest you first
> write some code that works, using the most simple and logical code.
> *Then* optimise it if required: and optimise by measuring. Even for
> experienced programmers, usually the hot-spot turns out not to be where
> they guess it might be. And as you have already implied, you might need
> to optimise for memory usage over speed.
If I design the basic framework of the application around a poor design, 
I'll have to rewrite the entire thing.  This is something to get right 
at the start.  It's not a premature optimization.  There are lots of 
places where I'm taking to "do enough and patch it later" approach.  The 
reason this isn't one, is because this needs to be handled now.
>
> > From a speed point of view, I'm pretty confident in saying that the
> implementation of Hash (in C) versus the implementation of Array (in C)
> is unlikely to be the bottleneck in your program. An Array may use less
> memory than a Hash - but since the Array or Hash holds only object
> references (essentially pointers), both may be small compared to the
> memory allocated to the objects you are holding within them.
>
OK.  If it's not the bottleneck, that will be very good.  Is there ANY 
reason that Hash is a better choice?  I don't believe that there is, but 
I could be wrong.  Certainly if they are the same speed, and use about 
the same amount of memory, I would prefer to use an Array.  My 
expectation is that Array will be faster as well as using significantly 
less RAM, but were they the same, I would still prefer to use an Array, 
as it will match more nearly the preferred order of I/O (among other 
reasons).  It is true that my reasons for preferring an Array are all 
minor, and if Array has some real drawback, then I would readily switch 
to Hash.  But I haven't heard of one.  I had some questions about how to 
use it, but those have, I believe, been answered.
P.S.:  I understand that if I use a Hash, item will be retrieved in the 
order entered when I iterate through them.  This isn't my desired 
result.  I want to retrieve them in index # order.  I'd rather not have 
to sort them.

P.P.S:  A part of my design is driven by my desire to have a language 
independent form of serialization...and YAML won't work for my 
purposes.  I may decide to have an isomorphic realization of this in a 
binary serialization that isn't language independent, but that WOULD be 
premature optimization.