Issue #12589 has been updated by vmakarov (Vladimir Makarov).


subtileos (Daniel Ferreira) wrote:
> Hi Vladimir,
>  
>  
>  On Tue, Mar 28, 2017 at 4:26 AM,  <vmakarov / redhat.com> wrote:
>  
>  >   You can find the code on
>  > https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch.  Please, read
>  > file README.md about the project first.
>  >
>  
>  Thank you very much for this post.

You are welcomed.

>  That README is priceless.
>  It is wonderful the kind of work you are doing with such a degree of
>  entry level details.
>  I believe that ruby core gets a lot from public posts like yours.
>  This sort of posts and PR's are the ones that I miss sometimes in
>  order to be able to understand in better detail the why's of doing
>  something in one way or another in terms of ruby core implementation.
>  
>  
>  > The HEAD of the branch
>  > https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch_base (currently
>  > trunk as of Jan) is and will be always the last merge point of branch
>  > rtl_mjit_branch with the trunk.  To see all changes (the patch is big,
>  > more 20K lines), you can use the following link
>  >
>  > https://github.com/vnmakarov/ruby/compare/rtl_mjit_branch_base...rtl_mjit_branch
>  
>  What kind of feedback are you looking forward to get?

  My approach to JIT is not traditional.  I believe that implementing JIT in MRI should be more evolutional to be successful.  The changes should be minimized but other ways should be still open.  My second choice would be a specialized JIT with 3-4 faster compilation speed like luajit but it is in order magnitude bigger project (probably even more) than the current approach and for sure has a bigger chance to fail at the end.  So the discussion of the current and other approaches would be helpful for me to better understand how reasonable my current approach is.

Another thing is to avoid a work duplication.  My very first post in this thread was to figure out if somebody is already working on something analogous.  I did not get an exact confirmation that I am doing a duplicative work.  So I went ahead with the project.

For people who works on a Ruby JIT openly or in secret, posting info about my project would be helpful.  At least my investigation of Oracle Graal and IBM OMR was very helpful.

Also I am pretty new to MRI sources (I started to work on it just about year ago).  I found that MRI lacks documentation and comments.  There is no document like GCC internals which could be helpful for a newbie.  So I might be doing stupid things which can be done easier and I might not be following some implicit source code policies.

>  Can I help in any way?
>  Is the goal to try to compile your branch and get specific information
>  from the generated ruby?
>  If so what kind of information?
> 

Trying the branch and informing what you like or don't like would be helpful.  It could be anything, e.g. insn names.  As I wrote RTL insns should work for serious Ruby programs.  I definitely can not say the same about JIT.  Still there is a chance that RTL breaks some code.  Also RTL code might be slower because not all edge cases are implemented with the same level optimization as stack code (e.g. multiple assignment) and some Ruby code can be better fit to the stack insns.  It would be interesting to see such code.

MJIT is at very early stages of development.  I think it will have a big chance to be successful if I achieve inlining on the path `RUBY->C->Ruby` for a reasonable compilation time.  But even implementing this will not speed some Ruby code considerably (e.g. floating point benchmarks can not be speed up without changing representation of double/VALUE in MRI).
 
 
>  >
>  >
>  > The project is still at very early stages.  I am planning to spend
>  > half of my work time on it at least for an year.  I'll decide what to
>  > do with the project in about year depending on where it is going to.
>  
>  In the README you explain very well all the surroundings around your
>  choices and the possibilities.

I omitted a few other pros and cons of the choices.

>  That makes me believe there may be space for collaboration from
>  someone that is willing to get deeper into the C level code.
>  If there is anyway I can be helpful please say so.
>
  
Thank you.  I guess if I get some considerable performance improvement, some help can be useful for more or less independent works.  But unfortunately, I am not at this stage yet.  I hope to get performance improvements I expect in a half year.

>  Once again thank you very much and keep up with your excellent
>  contribution making available to the rest of us the same level of
>  detail and conversation as much as possible.
>

Thank you for kind words, Daniel and Eric.
  
>  I was waiting a little bit to see the amount of reception this post
>  would have and surprisingly only Eric replied to you.
>  Why is that?

I think people need some time to evaluate the current state of the project and perspectives.  It is not a traditional approach to JIT.  This is what at least I would do myself.  There are a lot of details in the new code.  I would spend time to read sources to understand the approach better.  And usually the concerned people are very busy.  So it might need a few weeks.


----------------------------------------
Feature #12589: VM performance improvement proposal
https://bugs.ruby-lang.org/issues/12589#change-63989

* Author: vmakarov (Vladimir Makarov)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
  Hello.  I'd like to start a big MRI project but I don't want to
disrupt somebody else plans.  Therefore I'd like to have MRI
developer's opinion on the proposed project or information if somebody
is already working on an analogous project.

  Basically I want to improve overall MRI VM performance:

  * First of all, I'd like to change VM insns and move from
    **stack-based** insns to **register transfer** ones.  The idea behind
    it is to decrease VM dispatch overhead as approximately 2 times
    less RTL insns are necessary than stack based insns for the same
    program (for Ruby it is probably even less as a typical Ruby program
    contains a lot of method calls and the arguments are passed through
    the stack).

    But *decreasing memory traffic* is even more important advantage
    of RTL insns as an RTL insn can address temporaries (stack) and
    local variables in any combination.  So there is no necessity to
    put an insn result on the stack and then move it to a local
    variable or put variable value on the stack and then use it as an
    insn operand.  Insns doing more also provide a bigger scope for C
    compiler optimizations.

    The biggest changes will be in files compile.c and insns.def (they
    will be basically rewritten).  **So the project is not a new VM
    machine.  MRI VM is much more than these 2 files.**

    The disadvantage of RTL insns is a bigger insn memory footprint
    (which can be upto 30% more) although as I wrote there are fewer
    number of RTL insns.

    Another disadvantage of RTL insns *specifically* for Ruby is that
    insns for call sequences will be basically the same stack based
    ones but only bigger as they address the stack explicitly.

  * Secondly, I'd like to **combine some frequent insn sequences** into
    bigger insns.  Again it decreases insn dispatch overhead and
    memory traffic even more.  Also it permits to remove some type
    checking.

    The first thing on my mind is a sequence of a compare insn and a
    branch and using immediate operands besides temporary (stack) and
    local variables.  Also it is not a trivial task for Ruby as the
    compare can be implemented as a method.

  I already did some experiments.  RTL insns & combining insns permits
to speed the following micro-benchmark in more 2 times:

```
i = 0
while i<30_000_000 # benchmark loop 1
  i += 1
end
```

The generated RTL insns for the benchmark are

```
== disasm: #<ISeq:<main>@while.rb>======================================
== catch table
| catch type: break  st: 0007 ed: 0020 sp: 0000 cont: 0020
| catch type: next   st: 0007 ed: 0020 sp: 0000 cont: 0005
| catch type: redo   st: 0007 ed: 0020 sp: 0000 cont: 0007
|------------------------------------------------------------------------
local table (size: 2, temp: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] i
0000 set_local_val    2, 0                                            (   1)
0003 jump             13                                              (   2)
0005 jump             13
0007 plusi            <callcache>, 2, 2, 1, -1                        (   3)
0013 btlti            7, <callcache>, -1, 2, 30000000, -1             (   2)
0020 local_ret        2, 0                                            (   3)
```

In this experiment I ignored trace insns (that is another story) and a
complication that a integer compare insn can be re-implemented as a
Ruby method.  Insn bflti is combination of LT immediate compare and
branch true.

A modification of fib benchmark is sped up in 1.35 times:

```
def fib_m n
  if n < 1
    1
  else
    fib_m(n-1) * fib_m(n-2)
  end
end

fib_m(40)
```

The RTL code of fib_m looks like

```
== disasm: #<ISeq:fib_m / fm.rb>==========================================
local table (size: 2, temp: 3, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] n<Arg>
0000 bflti            10, <callcache>, -1, 2, 1, -1                   (   2)
0007 val_ret          1, 16
0010 minusi           <callcache>, -2, 2, 1, -2                       (   5)
0016 simple_call_self <callinfo!mid:fib_m, argc:1, FCALL|ARGS_SIMPLE>, <callcache>, -1
0020 minusi           <callcache>, -3, 2, 2, -3
0026 simple_call_self <callinfo!mid:fib_m, argc:1, FCALL|ARGS_SIMPLE>, <callcache>, -2
0030 mult             <callcache>, -1, -1, -2, -1
0036 temp_ret         -1, 16
```

In reality, the improvement of most programs probably will be about
10%.  That is because of very dynamic nature of Ruby (a lot of calls,
checks for redefinition of basic type operations, checking overflows
to switch to GMP numbers).  For example, integer addition can not be
less than about x86-64 17 insns out of the current 50 insns on the
fast path.  So even if you make the rest (33) insns 2 times faster,
the improvement will be only 30%.

A very important part of MRI performance improvement is to make calls
fast because there are a lot of them in Ruby but as I read in some
Koichi Sasada's presentations he pays a lot of attention to it.  So I
don't want to touch it.

  * Thirdly.  I want to implement the insns as small inline functions
    for future AOT compiler, of course, if the projects described
    above are successful.  It will permit easy AOT generation of C code
    which will be basically calls of the functions.

    I'd like to implement AOT compiler which will generate a Ruby
    method code, call a C compiler to generate a binary shared code
    and load it into MRI for subsequent calls.  The key is to minimize
    the compilation time.  There are many approaches to do it but I
    don't want to discuss it right now.

    C generation is easy and most portable implementation of AOT but
    in future it is possible to use GCC JIT plugin or LLVM IR to
    decrease overhead of C scanner/parser.

    C compiler will see a bigger scope (all method insns) to do
    optimizations.  I think using AOT can give another 10%
    improvement.  It is not that big again because of dynamic nature
    of Ruby and any C compiler is not smart enough to figure out
    aliasing for typical generated C program.

    The life with the performance point of view would be easy if Ruby
    did not permit to redefine basic operations for basic types,
    e.g. plus for integer.  In this case we could evaluate types of
    operands and results using some data flow analysis and generate
    faster specialized insns.  Still a gradual typing if it is
    introduced in future versions of Ruby would help to generate such
    faster insns.

  Again I wrote this proposal for discussion as I don't want to be in
a position to compete with somebody else ongoing big project.  It
might be counterproductive for MRI development.  Especially I don't
want it because the project is big and long and probably will have a
lot of tehcnical obstacles and have a possibilty to be a failure.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>