On Tue, Jan 17, 2012 at 12:08 PM, Peter Vandenabeele <peter / vandenabeele.com> wrote: > I assumed (incorrectly turns out for Ruby 1.9.3) that for large data sets > there > would be a significant performance difference. Because, upon determining the > "bins" of uniq values, that is essentially a form of "sorting", which can > be O(n^2) > if not careful. There is no sorting going on. Some tests explicitly use a Hash others use Enumerable#group_by which also uses a Hash (you can see it from the return). > Questions: > ======== > > Why would ruby 1.9.3. be so much better at this than ruby 1.8.7 ? > Could it be because the Hash is now "ordered" so it can do an efficient > algorithm when adding an entry to a bin? No. Hashes in 1.9.* only maintain insertion order - nothing else. > Or is it object creation? The difference is more likely to do with the dramatically changed implementation of MRI between 1.8.* and 1.9.*. Maybe there were also some Hash implementation tweaks but the ordering is certainly not responsible. > Why are certain methods leading to non-linear behavior? I assume that this might have to do with memory allocation and GC. Choosing n=1 for repetition is too low to yield proper results IMHO since GC for the previous test might kick in etc. Btw. with Benchmark.bmbm you can have a warmup phase before the real measurement. I tried to tweak a bit but not much gain (~1% at most): https://gist.github.com/1633008 Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/