On Wed, Apr 20, 2011 at 9:53 PM, Clifford Heath <no / spam.please.net> wrote:
> On 04/20/11 08:41, Charles Oliver Nutter wrote:
>> I think you are underestimating the cost of performing a dynamic call.
>
> I'm not. I'm expecting that JRuby would detect that a core Fixnum
> method has been monkey-patched, and set a global variable. If the
> variable is set (an inline check, susceptible to branch-prediction)
> then it would default to conservative behaviour by calling dispatch,
> otherwise continue with the shortcuts as normal.
>
> I think this is what you mean when you say:
>
>> Now using an even faster check, that only dispatches to "hash" if the
>> object is a Fixnum and the Fixnum class has not been reopened. Notice
>> it's faster than full dyncall, but still a good bit slower than the
>> fast path:
>
> If I read the result correctly, that's 0.871/0.753 or 15.6% slower,
> on a test that does nothing but Fixnum hash lookups in a hash with
> a single element.
>
> What I'd ask is, what's the percentage of time in actual programs
> typically spent doing Fixnum hash lookups? I'd wager it's less than
> 1%, meaning your 16% is now 0.16% - and it makes the behaviour
> incorrect per the Ruby spec. Personally, I'd take the hit.
>
> The cost of the inline check of this variable is what I was implying
> would vanish into the dust in performance tests, as the branch
> prediction figures out that "we always go this way". Is your
> implementation of this check implemented to make best advantage of
> branch prediction? Frankly I'm rather amazed that it's as big as 18%
> merely to decide to use the shortcut. Does that %age drop when using
> a hash with more than one entry? It seems like a benchmark that was
> created to magnify the change, not to realistically demonstrate the
> effect.

Well, it has to get the value from somewhere. In JRuby, the
Fixnum/Float checks are each a field on org.jruby.Ruby (the JRuby
"runtime" object in play), which is accessible via a field on
org.jruby.runtime.ThreadContext (passed on the call stack to most
methods) or via the current object's metaclass, an org.jruby.RubyClass
object with a "runtime" field. So at a best, it's two field
dereferences and one (inlined) virtual method invocation, and at worst
it's three field references and two (probably inlined) virtual method
invocations. Perhaps that defeats branch prediction, since we're at
least two memory hops away from the value we're checking.

Branch prediction works great when the data is in a register. It
doesn't do as well when there's memory dereferences in the way.

>> Bottom line is that *any* additional branching logic will add
>> overhead, and full dynamic calling introduces even more overhead on
>> just about any implementation. Whether that's a fair trade-off is not
>> for me to decide ;)
>
> Well, yes and no. If the effect is 0.16% in typical apps, you'd be
> well within your right to maintain it should work like MRI and/or
> the documentation.
>
> Do you still think this needs a JRuby bug report?

In general we have followed the path of matching MRI in such cases,
except when there's a very strong argument not to do so. If you can
convince Ruby core, or if you think you have a strong enough case for
deviating from MRI's behavior, go ahead and file the bug and we can
continue exploring it there.

- Charlie