I have a proof-of-concept patch to MRI that caches #to_s values for
immutable values. It is implemented using a few fixed size hash tables.
It reduces the number of #to_s Strings created during the MRI test
suite for NilClass, TrueClass, FalseClass, Symbol and Float objects by
1890 Strings.
It requires a minor semantic change to Ruby core. In practice, most
Ruby String literals quickly become garbage. This minor change could
cascade into a huge performance improvement for all Ruby implementations
-- as will be illustrated below:
#to_s may return frozen Strings.
This appears to not be a problem since any callers of #to_s are
likely to anticipate that the receiver may already be a String and are
not going to mutate it -- it is a coercion. If this proves problematic,
a Object#dup_if_frozen method might be helpful. (Aside: a fast
#dup_unless_frozen method might be helpful for general memoization of
computations! :)
This caching technique could be extended into other immutables (the
Numerics) and objects whose #to_s representations never change (Class,
Module?) and for #inspect under similar constraints.
In the patch, Fixnum#to_s is not cached because Fixnums are often
incremented during long loops, thus, any cache is quickly churned.
However, this could be enabled if it proves useful in practice.
If this new semantic for #to_s is reasonable, I recommend explicitly
storing frozen strings for true.to_s, false.to_s, nil.to_s and storing
Symbol#to_s with each Symbol, likewise for #inspect.
If Symbol#to_s was guaranteed to be always be cached, this would
enable the use of:
puts :"some string".to_s
instead of
puts "some string"
, as an in-line memoized frozen String that creates no garbage for a
consumer that will never mutate it. A parser or compiler could
recognize Symbol#to_s as an operation with no side-effect and elide it,
providing a true String constant. This idiom would irradiate the
pointless String garbage created by the evaluation of every lexical
String literal.
This is far more expressive and concise than:
SOME_STRING = "some string".freeze
...
puts SOME_STRING
The alternative to :"some string".to_s might be to memoize all String
constants literals as frozen. This is a superior syntax, but old code
would need to change on a massive scale, but would be easy to diagnose,
to support this semantic:
str = '' # make mutable empty string.
str << "foo" # "foo" is garbage
str << "bar" # "bar" is garbage
would become:
str = ''.dup # make mutable empty string.
str << "foo" # "foo" is not garbage
str << "bar" # "bar" is not garbage
The latter code is backwards compatible with the current String literal
semantics.
Let me know if anyone is interested in this idea and patch:
Kurt Stephens