I added resize() to hash.c. It offers a speedup if you know how large
your dataset will be. I don't know if this is the right way to do it,
what do you think:
diff hash.c hash.c~
294,308d293
< static VALUE
< rb_hash_resize(hash, size)
< VALUE hash;
< int size;
< {
< st_table *tbl;
<
< tbl = st_init_table_with_size(&objhash, size);
< st_foreach(RHASH(hash)->tbl, rb_hash_rehash_i, tbl);
< st_free_table(RHASH(hash)->tbl);
< RHASH(hash)->tbl = tbl;
<
< return hash;
< }
<
1522,1523d1506
<
< rb_define_method(rb_cHash,"resize", rb_hash_resize, 1);
Here is with calling a.resize(1000000):
/usr/bin/time ~/junk.rb
3.78user 0.39system 0:04.17elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1019major+14497minor)pagefaults 0swaps
here is without:
/usr/bin/time ~/junk.rb
5.57user 0.25system 0:05.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1019major+12932minor)pagefaults 0swaps
Or better to be in Hash.new("default",size) ?
Or just not a good idea at all?
thanks,
-joe