2013/3/21 "Martin J. D=FCrst" <duerst / it.aoyama.ac.jp>

> Hello Charlie,
>
>
> On 2013/03/21 9:03, Charles Oliver Nutter wrote:
>
>> It had to happen eventually...
>>
>> We received a pull request recently for a change that makes JRuby's
>> hashing of Strings, Booleans, nil, and Symbols be consistent.
>> Basically, it provides hardcoded hashes for Booleans and nil, and
>> makes it possible to disable seeded hashes for String and Symbol.
>>
>> PR: https://github.com/jruby/**jruby/pull/590<https://github.com/jruby/j=
ruby/pull/590>
>>
>> My question for ruby-core: at what point did you decide to make hash
>> for e.g. nil not be a single value (it was "4" in 1.8.7 and
>> different/random in 1.9.3+), and why did you do it?
>>
>
> Yui already gave a pointer. Actually, neither NilClass nor TrueClass nor
> FalseClass implement #hash. All three fall back to Object. So do Symbol a=
nd
> Fixnum. So it seems to be mainly the result of not doing anything more th=
an
> necessary in terms of implementation against the potential DOS attacks.
>
>
>  I think it's valid to want to be able to consistently hash these
>> values across runtimes, but I want to understand the implications
>> before I merge this patch into JRuby proper.
>>
>> Thoughts?
>>
>
> I can't currently see a security problem with making the hash values of
> nil, true, and false consistent across runtimes. But then that's not a
> guarantee that there are none (I'm not a security expert). And I don't se=
e
> a reason for making only these three stable.
>
> When it comes to Symbols, we get back to the question to what extent
> Symbols may/can/shouldn't be created based on data coming in from the
> outside of an application, which we discussed related to garbage collecti=
on
> of symbols.
>
> Overall, having a switch to eliminate introducing randomness into hash
> values may be something to consider. But it will produce problems when an
> application is put together from various libraries: Some of these librari=
es
> may depend on hashes being stable, while some others may be open to DOS
> attacks when hashes are stable.
>
>

I agree that NilClass, TrueClass and FalseClass are probably not an issue
since they are singletons. The core problem is a user being able to
predictably create different objects that hash to the same value. Symbol,
Fixnum and Bignum fall back to Object/Kernel#hash, and Kernel#hash uses
MurmurHash [1] & [2].

While randomized, Jean-Philippe Aumasson and Daniel Bernstein showed last
year that a randomized MurmurHash isn't enough in general to prevent this
kind of denial of service attacks. The proof of concept can be found at [3]
and in response, SipHash was introduced, replacing MurmurHash in
rb_mem_hash [4].

[1]
https://github.com/ruby/ruby/blob/7e052c7b81d22fe4556e313c1f24ccda15502ab1/=
object.c#L129
[2]
https://github.com/ruby/ruby/blob/7e052c7b81d22fe4556e313c1f24ccda15502ab1/=
st.c#L1477
[3] https://github.com/emboss/schadcode/tree/master/ruby/cruby
[4]
https://github.com/ruby/ruby/blob/f7108c5d1a10cacfbb79a0b370c8136bed74af66/=
random.c#L1422

Off the top of my head I would argue that the same technique cannot be
applied to Kernel#hash. The argument to be hashed is bounded and just a few
bytes long - for the attack to work we would need potentially unbounded
byte arrays.

Of course, the safest thing would probably be switching Kernel#hash to
SipHash, too. I'm not sure to what extent this would affect performance.
Maybe we could follow the JRuby approach - MurmurHash was replaced with
Perl's hash, with the option to switch to SipHash on demand (@headius -
this is correct?). Jean-Philippe confirmed that Perl's hash has not been
broken (yet), but it's not unlikely that this could happen in the future.
Since we already have SipHash in CRuby, it would be relatively easy to do
the same thing, I guess. Another upshot would be that Perl's hash is faster
than MurmurHash, too. What do you think?

So maybe those who need stable hashes should implement #stable_hash
> methods, and if it turns out that this is used often, we can add it to Ru=
by
> itself.
>
>
I like the idea. It is stated in the docs [5] that nobody should assume the
values to be consistent across VM boundaries, doing otherwise would clearly
violate this contract.

-Martin

[5] http://www.ruby-doc.org/core-2.0/Object.html#method-i-hash