>  % ruby bang.rb 2
>  calculating x = (a.dup.add!(b)).mul!(c.dup.add!(d)) ...
>  Time: 3.47 sec
>
> % ruby bang.rb 3
>  calculating x = (a+b).mul!(c+d) ...
>  Time: 2.85 sec

OK, I see your point: 
a.dup.add!(b) may be less effective than a+b
since the first one needs two loops: one copy and one 
addition whereas the second one just one: only the addition...

This is a true objection against my idea!

However, I see a points which is not covered by this benchmark:
the effect on garbage collection is perhaps not clear:
you allocate a small number of large object. For a large number
of small objects, and for a longer running code (which makes 
gc necessary) the numbers could look quite different 
(I could not test it, because I did not manage to install your 
package under AIX. I will try it again later...)
Perhaps REPEAT=10_000_000 and n=10 would produce other results.

But anyway your point is still valid:
Even if we would allow +=! operator, the interpreter could not
generate a + operator optimally from it.

The most effective solution would be to allow the compiler to
define both + and +=! let it assume that they perform the same
task. Then it could optimize the expressions optimally by the 
following two rules:

Whenever it needs an allocation (new temporary), it should use 
the binary + form.

But if it only needs to modify an existing value then it should
use the +=! form.

Of course it would be a bit uglier because the programmer must
then guarantee the semantic equivalence...

I don't whether it is the right way...

Thank you for your correction, and your quick answer!