On Feb 15, 2004, at 4:49 AM, Kirk Haines wrote:

> Well, dang.  Did the threads about your code possibly hitting garbage
> collection help out?  This is an interesting one.  Please let us know 
> what
> you figure out.

I had planned on going ahead and simply writing something in C (I've 
already written about 2000 lines of C code for our program to optimize 
routine operations and save memory).  I'd probably just read the file 
into an mmap and write it out into another file, since this would be 
the fastest way to do the IO.

At any rate, just before sitting down to write this, I had our sysadmin 
upgraded our ruby install to 1.8.1, since I wanted to move the latest C 
code from our dev sever, and it used the new rb_str_buf (this was for a 
portion of our program entirely unrelated to our IO issue).  I then 
went to make one last run to get a "before" benchmark, and I discovered 
that ruby 1.8.1 fixed the problem.

I may still write an ad hoc C function to do a direct copy in C just to 
see how fast I can make it, but that would be a spare time project.  I 
haven't had time to dig any deeper.

And now for something completely different...

Incidentally, I grabbed the Judy array code off of the raa and started 
playing with it.  I haven't taken a close look at the C code that 
implements the JudyHash, since the JudySL is functionally identical to 
what people use hashes for 99.999% of the time (viz., most people 
created hashes that use string keys of a reasonable length) and is 11% 
to 15% faster than a ruby hash.

I spent a few hours cleaning up the C code in the raa ruby to judy 
interface code, and I was able to get about an 8% improvement in both 
the JudySL (which I've renamed "KeyedList" for use within ruby) and 
JudyL (which I've renamed "NumberedList"--its essentially a sparse 
array with non-continuous indices).

At any rate, the JudyArrays are really, really fast, and they scale 
like the dickens (we're dealing with hashes that have hundreds of 
thousands of members.  I'd like to be able to move to the millions, and 
JudyArrays look like they'll get us there).  Moreover, the HP Judy 
libraries (as well as the updates from the debian folks) are 
exceptionally well designed.

Has any one else have any luck playing with the Judy libraries?  Is 
there any reason not to make them part of the standard library?

Lastly, the benchmarks I've been running on the JudyArrays indicate 
that ruby 1.8.x hashes are much more scalable (i.e., lookup and insert 
times increase linearly with size) than ruby 1.6.x hashes.  Is this 
correct?

-------------------------------------------------------
David King Landrith
   (w) 617.227.4469x213
   (h) 617.696.7133

Life is tough.  It's even tougher if you're
an idiot.  --John Wayne
-------------------------------------------------------
public key available upon request