Issue #14104 has been updated by ko1 (Koichi Sasada).


Here is a result of benchmark result with

```
ruby ~/ruby/src/trunk/benchmark/driver.rb -e ruby_2_4::~/ruby/install/ruby_2_4/bin/ruby -e trunk_oct::~/ruby/install/trunk_r60079/bin/ruby -e trunk::~/ruby/install/trunk/bin/ruby -e modified::~/ruby/install/gitruby/bin/ruby  -v -r 5
```

https://gist.github.com/anonymous/64b7e73e5972befc33a9acde09612f90


----------------------------------------
Feature #14104: Remove `trace` instructions
https://bugs.ruby-lang.org/issues/14104#change-67798

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 2.5
----------------------------------------
# Abstract

Remove `trace` instructions from bytecode and it will improve performance, 10% faster.
Instead of `trace` instruction, we add event information for each instruction.
In other words, we unified `trace` instructions with the following instructions.

# Backgroud

`TracePoint` and old `set_trace_func` method need to know which point they need to kick hooks at.
`trace` VM instruction is inserted to represent such hook points and kicks hooks if needed.
It is easy to introduce hook points into bytecode.
However, this technique introduces some overhead on "no-hooks" situations (and I believe 99% of workloads does not need hooks, especially on production environment).

For example, compiled code of the following method has several `trace` instructions.

```
def foo x 
  _y = x
end
```

Disasm output is here:

```
== disasm: #<ISeq:foo@<compiled>>=======================================
local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] x<Arg>     [ 1] _y         
0000 trace            8                                               (   1)
0002 trace            1
0004 getlocal         x, 0
0007 dup              
0008 setlocal         _y, 0
0011 trace            16                                              (   2)
0013 leave                                                            (   1)
```

```
# disasm code is:
puts RubyVM::InstructionSequence.compile("def foo x
  _y = x
end", nil, nil, 0, trace_instruction: true, operands_unification: false).disasm
```

You can see 3 `trace` instructions on 0000, 0002 and 0011. They means hook points for :call, :line and :return events. It enables to support `TracePoint`, but introduces 3 dispatches for useless instructions.

We can remove `trace` instructions with the `trace_instruction: true` compile option, but we can't turn on `TracePoint` for such compiled ISeqs.

# Proposal

Remove `trace` instruction and introduce event flags as information for each instructions.

With same code above, we can get the following disasm code.

```
== disasm: #<ISeq:foo@<compiled>>=======================================
local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] x<Arg>     [ 1] _y
0000 getlocal         x, 0                                            (   1)[LiCa]
0003 dup
0004 setlocal         _y, 0
0007 leave            [Re]
```

0000 shows `[LiCa]`, it means :line and :call events.
0007 shows `[Re]` which represents a :return event.

It is too heavy if we check these event flags on each instruction dispatch.
So we introduce new instructions: `trace_` prefix instructions such as `trace_getlocal` and so on.
All normal instructions have corresponding `trace_` prefix instructions (generated by the VM generator automatically).

If we need to enable `TracePoint` (or `set_trace_func`), then rewrite all of instruction sequences (ISeqs) in the heap from normal instructions to trace prefix instructions dynamically. `trace_` prefix instructions check flags for an executing instruction and kick hooks if needed.

* Good:
  * a. Low overhead for normal execution because we don't need to dispatch useless `trace` instructions.
  * b. We don't have `trace` instructions, so that it is easy to turn on tail call optimization flags.
  * c. We can enable/disable traces on small units.
* Bad:
  * a. Big overhead to turn on/off `TracePoint` because we need to modify all of existing ISeqs.
  * b. Introducing a few incomatibilities for tracing.

For (Bad-a) I believe nobody turn on and trun off tracepoints many times, so that turning on/off overhead is not so big impact.

For (Good-c), I need to explain more.
Current `TracePoint` enables all of hooks for all methods, ISeqs. With this technique, however, we can select enable/disable the hooks in more small units, like files, classes, methods or a line. For example, we can enable the hooks only method named `foo`. Now we have no interface to specify it, but we can design later.

# Evaluation

```
require 'benchmark'

def foo n
end

N = 100_000_000
Benchmark.bm(10){|x|
  x.report('trace off'){
    N.times{
      foo(10)
      foo(10)
      foo(10)
    }
  }
  x.report('trace on'){
    TracePoint.new{}.enable
    N.times{
      foo(10)
      foo(10)
      foo(10)
    }
  }
}
```


```
Currnt trunk:
                 user     system      total        real
trace off    9.090678   0.000000   9.090678 (  9.083007)
trace on   108.217320   0.005524 108.222844 (108.201941)

modified:
                 user     system      total        real
trace off    6.647247   0.000000   6.647247 (  6.641530)
trace on    93.405389   0.000000  93.405389 ( 93.404864)
```

# Compatibility issues

With this change, we introduced a few incompatibilies:

* Line numbers on :return/:b_return events show the last executed lines, instead of `end` lines (without `return` statement).
* `begin` block doesn't show `:line` events.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>