Issue #16188 has been updated by jeremyevans0 (Jeremy Evans).


Eregon (Benoit Daloze) wrote:
> jeremyevans0 (Jeremy Evans) wrote:
> > Internally, there are not really performance implications for the choice to treat keyword arguments as a last positional hash.  If we not do so, the `foo(key: value)` call would be an ArgumentError.
> 
> I think there are. If args and kwargs are completely separated, we can choose to represent kwargs as we wish, and only need to build a Hash if the method takes a `**kwrest`.
> But in the current approach, for a call site calling both a method taking kwargs and another not taking kwargs, we'll need logic to handle both the kwargs case and the conversion to a last positional Hash. If they were completely separated, the positional Hash would just be an ArgumentError and we wouldn't need that logic at all.
> I agree that would be too incompatible though, so let's focus on optimizing the semantics in Ruby 2.7 and for Ruby 3.

I should preface this by saying all of my comments related to performance implications were and are implicitly refering to CRuby's implementation.  I have no knowledge of the internals of other Ruby implementations and no idea what the performance implications of the differences are on other implementations.

Note that when discussing performance implications of keyword arguments, we should be clear what we are discussing.  In Ruby 2.7 and 3.0, the caller still doesn't need to know about the callee's argument handling, it just passes all arguments through.  It is up to the callee to raise an ArgumentError if the arguments are not correct.  This is the same as older Ruby versions.  Just as in older Ruby versions, the callee handles treating keywords as a final positional hash if the method does not accept keyword arguments.  The only thing that is really changed is how the callee is handling the arguments, the caller code is generally unaffected.

> > The general structure remains the same.  There are warnings added for methods that accept keyword arguments that will break in Ruby 3:
> 
> Right, what about Ruby 3?
> We should aim something that is both semantically simpler and easier to optimize then.

If you want to see what Ruby 3 keyword argument handling will look like, I have a branch prepared for that already: https://github.com/jeremyevans/ruby/tree/r3

I have been using this for testing my apps and libraries to be sure that any case that will break in Ruby 3 will be warned in Ruby 2.7.  Most of the diff for the core changes is code removal, there are very few additions: https://github.com/jeremyevans/ruby/commit/6aa7d021f694537ad15d3f72941e4816639ddfd5

The code does end up being semantically simpler, and it should be faster, though I don't expect the performance to be significantly better in real-world cases.

> How are *args and **kwargs actually represented in MRI?
> Is it like a flat array of "arguments", and the last one can be a keyword Hash, and there is some flag passed to the callee to differentiate a last positional Hash from keyword arguments?
> Then methods taking kwargs need to check the flag to know how many positional args are passed, and whether kwargs are passed.
> Methods not taking kwargs don't even need to check the flag, and can consider the entire flat array as positional arguments.

In most of the C-API, `int argc, VALUE *argv`, with the addition of `int kw_splat` in Ruby 2.7 for whether the final argument is a hash that should be treated as keywords.

> I guess cases like `foo(a: 1, b: 2)` are not represented with a Hash to avoid allocations in MRI, how does it look like then for such a case?

There is an optimization for literal keyword use in both the caller and callee (if both are written in Ruby) so that a hash is not allocated in that case if no keyword splats are present in the caller and callee.  

> That makes the case for methods not taking kwargs quite more complicated, as they need to read the "passed optimized kwargs" flag and if set convert those optimized kwargs to a Hash, and add + 1 to the count of positional arguments.

In CRuby, it's not significantly more complicated.  The callee may need to add the literal keywords to a hash anyway (if callee uses a keyword splat). Moving that hash to the last positional argument is very little work.

> > For methods that are flagged with `ruby2_keywords` (which can only happen if the method accepts a regular splat but no keywords or keyword splat), if the method is called with keywords, we set a flag so that the keyword hash is treated as a last positional hash with the keyword flag set.  Additionally, in this case the empty keyword hash is not removed, it is treated the same as a non-empty keyword hash.
> 
> That should be fine, because it means only `ruby2_keywords` flagged methods would have a little overhead for that extra logic (i.e., writing a flag or @ivar on the Hash).

Correct, the check to add the flag is only on `ruby2_keywords` flagged methods.

> > For all method calls that use an argument splat and pass no keywords or keyword splat, if the last element of the argument splat array is a hash with the keyword flag set, that argument is treated as keywords instead of a positional hash.
> 
> That's the one I'm concerned about, it means every `foo(some arguments)` with `def foo(*args); bar(*args); end` will have to check if the last positional argument is a Hash with the flag set:
> `args.size >= 1 && Hash === args[-1] && flagged?(args[-1])`. That's a quite a few branches and it's not just reading one bit from memory, it's at least 4 memory loads (size, [-1], .class, .flag).
> I think this is another motivation for having `ruby2_keywords` only in Ruby 2.7, as it's clearly not free for performance.

It's not free in CRuby.  However, if you look at all of the surrounding code, I don't think the performance difference in CRuby will be significant.

> I would think there are quite a few methods taking a rest argument and no explicit kwargs, i.e. `def foo(*args)`, `def foo(req, *args)`, `def foo(opt=default, *args)`, etc all need the extra check
> We could skip that if we know currently no method uses `ruby2_keywords`, but as soon as it's used we have to perform the check for every method call with the method taking a rest argument (and no explicit kwargs).

A sufficiently advanced compiler in a more optimized Ruby implementation should be able to determine in most cases whether the method is ever called with such a hash, and optimize the check out, correct? :)

Consider:

```ruby
ruby2_keywords def foo(*a) bar(*a) end
```

Other than the check at this specific `bar` call-site, you don't need to check other call sites.  Because when `bar` is called here, the hash with the flag is duped and the flag removed (as if the hash was keyword splatted), so the flagged hash never escapes this method. This is not how the master branch works in CRuby, but I consider that a bug, and I'll be fixing it shortly.

Handling more involved cases where the arguments are stored and can escape may require checking all call sites with splats and no explicit keywords, though.

> but `def foo(*args, **kwargs)`/`def foo(*args, kwarg:)`/`def foo(*args, kwarg: default)` do not need the check.

correct.

> Maybe we should be more explicit about when to convert a flagged Hash to keyword arguments.
> For instance we could have, using ActionDispatch::MiddlewareStack::Middleware's example:
> ```ruby
> class Middleware
>   attr_reader :args, :block, :klass
> 
>   def initialize(klass, args, block)
>     @klass = klass
>     @args  = args
>     @block = block
>   end
>   ruby2_keywords :initialize if respond_to?(:ruby2_keywords, true)
> 
>   # Current code
>   def build(app)
>     klass.new(app, *args, &block)
>   end
> 
>   # idea to be explicit when converting to keyword args
>   # and avoid extra checks in other call to methods with a *rest argument.
>   def build(app)
>     klass.send_keyword_hash(:new, app, *args, &block)
>   end
> end
> ```
> 
> That would limit that complicated check to only `send_pass_kwargs` calls, and not any other calls.
> It's also more explicit and less magic, which might be easier to understand and explain.
> I'm unsure about a good name, maybe `send_pass_kwargs`, `send_pass_flagged_kwargs`, `send_pass_flagged_hash`, `send_flagged_hash_as_kwargs`, `send_kwhash`, `send_keyword_hash`?
> What do you think?

`send_keyword_hash` is more explicit, but it requires significantly more code or a global monkey patch of `Object#send_keyword_hash` to be backwards compatible.  The main goal for `ruby2_keywords` is to allow existing method definitions to work without changing them, just by flagging the method.

I assume in your example, initialize should accept `*args` and not `args`, as otherwise, you can't use `ruby2_keywords`.

It seems odd to recommend we add another feature (`Kernel#send_keyword_hash`) to compliment a feature (`Module#ruby2_keywords`) that you are strongly recommending we remove in Ruby 3.0.

----------------------------------------
Misc #16188: What are the performance implications of the new keyword arguments in 2.7 and 3.0?
https://bugs.ruby-lang.org/issues/16188#change-82053

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
In #14183, keyword arguments became further separated from positional arguments.

Contrary to the original design though, keyword and positional arguments are not fully separated for methods not accepting keyword arguments.
Example: `foo(key: :value)` will `def foo(hash)` will pass a positional argument.
This is of course better for compatibility, but I wonder what are the performance implications.

The block argument is completely separate in all versions, so no need to concern ourselves about that.

In Ruby <= 2.6:
* The caller never needs to know about the callee's arguments, it can just take all arguments and pass them as an array.
  The last argument might be used to extract keyword, but this is all done at the callee side.
* Splitting kwargs composed of Symbol and non-Symbol keys can be fairly expensive, but it is a rare occurrence.
  If inlining the callee and kwargs are all passed as a literal Hash at the call site, there shouldn't be any overhead compared to positional arguments once JIT'ed.

In Ruby 2.7:
* The caller needs to pass positional and keyword arguments separately, at least when calling a method accepting kwargs.
  But, if it calls a methods not accepting kwargs, then the "kwargs" (e.g. `foo(key: :value)`) should be treated just like a final Hash positional argument.
* (If we had complete separation, then we could always pass positional and keyword arguments separately, so the caller could once again ignore the callee)

How is the logic implemented in MRI for 2.7?

Specializing the caller for a given callee is a well-known technique.
However, it becomes more difficult if different methods are called from the same callsite (polymorphic call), especially if one accepts kwargs and another does not.
In that case, I think we will see a performance cost to this approach, by having to pass arguments differently based on the method to be called.

What about delegation using `ruby2_keywords`?
Which checks does that add (compared to 2.6) in the merged approach with the Hash flag?



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>