I added resize() to hash.c.  It offers a speedup if you know how large
your dataset will be.  I don't know if this is the right way to do it,
what do you think:

diff hash.c hash.c~
294,308d293
< static VALUE
< rb_hash_resize(hash, size)
<     VALUE hash;
<     int size;
< {
<     st_table *tbl;
<
<     tbl = st_init_table_with_size(&objhash, size);
<     st_foreach(RHASH(hash)->tbl, rb_hash_rehash_i, tbl);
<     st_free_table(RHASH(hash)->tbl);
<     RHASH(hash)->tbl = tbl;
<
<     return hash;
< }
<
1522,1523d1506
<
<     rb_define_method(rb_cHash,"resize", rb_hash_resize, 1);

Here is with calling a.resize(1000000):
/usr/bin/time ~/junk.rb

3.78user 0.39system 0:04.17elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1019major+14497minor)pagefaults 0swaps


here is without:

/usr/bin/time ~/junk.rb

5.57user 0.25system 0:05.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1019major+12932minor)pagefaults 0swaps


Or better to be in Hash.new("default",size) ?
Or just not a good idea at all?

thanks,
-joe