On Thu, Apr 14, 2011 at 12:55 AM, Clifford Heath <no / spam.please.net> wrote:
> I presented results from MRI 1.8.7, JRuby 1.6.0, and Rubinius,
> and showed that they all had different shortcuts, and none
> reliably kept the Hash contract (of using eql? and hash).
> I.e. you can't rely on sensible code working the same in MRI
> and JRuby.

I posted results on JRuby master (1.6.1). You responded to that email with:

"That result is reasonable, using MRI. Interesting, and nice, that it
figures out it doesn't need to call Fixnum#hash to be able to choose
a bucket. (BTW, that *uncommon* case means there's a test and branch
which is superfluous and excessively costly for the more common case,
according to arguments you've made against detecting monkey patching)

If you do the same thing with JRuby however, the first test contains this:

Looking up using Integer:
nil"

I'm not sure you actually looked at my results.

> I think that's a problem. If you don't, then I'm done...

It may be a problem, or it may not. When running your example,
however, it seems to call your monkeypatched code in many places where
you claim it doesn't. So I'm still confused. I know we don't call
hash/eql? in all cases, but I'm trying to quantify what the correct
behavior should be and what that behavior would cost.

I'd be happy to continue discussing this as a JRuby issue. Would you
file something at http://bugs.jruby.org with expected and actual JRuby
1.6.1 results?

> Unless you care to point me to the place in the JRuby code
> where this shortcut occurs (where I could make a change to
> make it invisible), and a performance benchmark that would
> show the effect of doing so. Then I'll happily make the
> experiment to see whether I'm right (and the shortcut can
> be made invisible without affecting performance measurably,
> i.e. above the noise level of the benchmark). If I'm not,
> I'll openly admit I was wrong... but I've done some pretty
> hardcore optimizing in machine code before, and I think I
> can win this one.

I think you are underestimating the cost of performing a dynamic call.
Even in an optimizing VM (like JRuby/JVM) there's a much higher cost
for a dynamic call to "hash" than to just check that it's a Fixnum and
branch to custom logic. *Way* higher cost.

Here's stock JRuby 1.6.1, which isn't dispatching to "hash" for Fixnums:

~/projects/jruby jruby --server -rbenchmark -e "5.times { h = {};
h[1000] = 1000; puts Benchmark.measure { 10_000_000.times { h[1000] }
} }"
  0.908000   0.000000   0.908000 (  0.859000)
  0.623000   0.000000   0.623000 (  0.622000)
  0.699000   0.000000   0.699000 (  0.699000)
  0.747000   0.000000   0.747000 (  0.747000)
  0.753000   0.000000   0.753000 (  0.753000)

Here's the same benchmark, dispatching to "hash" through a per-class
cache (faster than typical call-site caching, roughly on par with
inlined calls):

~/projects/jruby  jruby --server -rbenchmark -e "5.times { h = {};
h[1000] = 1000; puts Benchmark.measure { 10_000_000.times { h[1000] }
} }"
  1.634000   0.000000   1.634000 (  1.580000)
  1.297000   0.000000   1.297000 (  1.297000)
  1.356000   0.000000   1.356000 (  1.355000)
  1.334000   0.000000   1.334000 (  1.334000)
  1.343000   0.000000   1.343000 (  1.344000)

Now using an even faster check, that only dispatches to "hash" if the
object is a Fixnum and the Fixnum class has not been reopened. Notice
it's faster than full dyncall, but still a good bit slower than the
fast path:

~/projects/jruby  jruby --server -rbenchmark -e "5.times { h = {};
h[1000] = 1000; puts Benchmark.measure { 10_000_000.times { h[1000] }
} }"
  1.057000   0.000000   1.057000 (  1.014000)
  0.977000   0.000000   0.977000 (  0.976000)
  0.885000   0.000000   0.885000 (  0.885000)
  0.903000   0.000000   0.903000 (  0.903000)
  0.871000   0.000000   0.871000 (  0.871000)

And 1.9.2 to compare:

~/projects/jruby  ruby1.9 -rbenchmark -e "5.times { h = {}; h[1000] =
1000; puts Benchmark.measure { 10_000_000.times { h[1000] } } }"
  1.270000   0.010000   1.280000 (  1.313950)
  1.270000   0.010000   1.280000 (  1.307163)
  1.270000   0.000000   1.270000 (  1.295588)
  1.260000   0.010000   1.270000 (  1.285083)
  1.260000   0.010000   1.270000 (  1.307108)

Bottom line is that *any* additional branching logic will add
overhead, and full dynamic calling introduces even more overhead on
just about any implementation. Whether that's a fair trade-off is not
for me to decide ;)

- Charlie