Issue #16505 has been reported by NagayamaRyoga (Nagayama Ryoga).

----------------------------------------
Feature #16505: Improve preformance of `RubyVM::InstructionSequence#to_binary`
https://bugs.ruby-lang.org/issues/16505

* Author: NagayamaRyoga (Nagayama Ryoga)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Abstract
Within #to_binary, deduplication of objects output to binary is performed, but the current implementation is achieved by a linear search of an array of objects (=`obj_list`). (https://github.com/ruby/ruby/blob/e288632f22b18b29efd20a1469292b0a3ba9b74c/compile.c#L9699-L9701)
On the other hand, iseq deduplication is faster because it is implemented using a hash. (https://github.com/ruby/ruby/blob/e288632f22b18b29efd20a1469292b0a3ba9b74c/compile.c#L9744-L9745)

This proposal speeds up object deduplication by using a hash.
This patch does not change the output binary.

## Implementation
https://github.com/ruby/ruby/pull/2835

## Evaluation
Environment:
- OS: macOS Catalina
- CPU: Intel Core i5
- Memory: 16GB

### address_lists_parser.rb
`address_lists_parser.rb`(https://github.com/mikel/mail/blob/master/lib/mail/parsers/address_lists_parser.rb) in [`mail` gem](https://github.com/mikel/mail) has an extremely huge array.
Call `# to_binary` on the iseq of this file and check its execution time and MD5 of the output binary.

The benchmark code:
```rb
require 'benchmark'
require 'digest/md5'

F = 'address_lists_parser.rb'
N = 100

iseq = RubyVM::InstructionSequence.compile_file(F)
bin = iseq.to_binary

puts "md5 hash: #{Digest::MD5.hexdigest(bin)}"

Benchmark.bm(12) do |x|
  x.report("to_binary x#{N}") {
    N.times do ||
      iseq.to_binary
    end
  }
end
```

- master (`ruby 2.8.0dev (2020-01-12T10:54:59Z master e288632f22) [x86_64-darwin19]`)

    ```
    md5 hash: fd80e7c0c8da7a9044e89139c6078137
                       user     system      total        real
    to_binary x100 27.162084   0.078262  27.240346 ( 27.675089)
    ```

- Proposal (`ruby 2.8.0dev (2020-01-12T12:39:10Z improve-performanc.. e05ad5ef81) [x86_64-darwin19]`)

    ```
    md5 hash: fd80e7c0c8da7a9044e89139c6078137
                       user     system      total        real
    to_binary x100  0.989403   0.036869   1.026272 (  1.063335)
    ```

The same binary was output before and after the change.
Execution speed is 26 times faster.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>