Issue #17100 has been updated by ko1 (Koichi Sasada).


Thank you for your question.

Eregon (Benoit Daloze) wrote in #note-2:
> * Reactor#recv/Channel#recv => Reactor#receive/Channel#receive. I don't think libc-like abbreviations should be used. Let's use actual words.

I want to ask Matz about naming. I don't care to rename it (send/recv is same character count so it is happy to write, though :p).

> * About synchronization for Modules, specifically, for instance variables, constants and class variables. Should it be consistent? I think it would be better. For instance module @ivars/constants/@@cvars can be accessed from any Reactor if it's a shareable value, and only main Reactor if non-shareable. This seems the most compatible while still safe, but it might be surprising that the access is allowed depending on the value.

I understand the concerns. Current design is fixed version by Matz.
Making it consistent can be an option.
BTW, there is some inconsistent by variable types:

```ruby
p @ivar
p @@cvar #=> t.rb:2:in `<main>': uninitialized class variable @@cvar in Object (NameError)
```

so consistency is not best priority I guess.

> * I think we need to introduce `#deep_freeze` to make it easy to create deeply-immutable objects conveniently and efficiently.

Yes. We need to consider about new syntax or method.
But not essential. I think Ruby 3.1 is okay if it is not determined.

> * On a similar note, `#deep_copy` seems useful as a more optimized way than Marshal#dump+load.

Maybe it is different topic, but I agree it will help.

> * I think it is worth noting that `send/yield(obj, move: true)` still involves some shallow copying.

I agree. ractor.md will cover it.

> * ObjectSpace should be per Reactor, right? It seems important for guarantees that `ObjectSpace.each_object` only iterates objects in the current Reactor (otherwise all low-level data races can happen again).

This is very very difficult problem I ignored.
Now there is no way to recognize which objects belongs to which Ractors.

Simply `each_objects` method is allowed on single Ractor mode (no `Ractor.new`) is one option I guess.

> * C extensions: should C extensions be marked as default thread-safe, or thread-unsafe? It seems considered thread-safe by default in the proposal (sounds optimistic, but also should help to identify those which are not thread-safe). I'm looking forward to have C extensions executing in parallel. Currently TruffleRuby defaults to using a global monitor for C extensions (it's a CLI flag) to guarantee the same semantics as CRuby (IIRC we found that the `openssl` C ext and a few others manipulate their internal state in a thread-unsafe manner).

Default unsafe same as TruffleRuby, and now this feature is not supported on PR.

> * Previously it was mentioned C extensions would only be allowed for the main/initial Reactor. I'm happy to hear this is no longer the case. However, isn't there a risk that C extensions might pass Ruby objects without copying or transfer to other Reactor via e.g. C global variables? (and that could cause data races/segfault). I guess we have to trust C extensions to not do that? Probably we need to introduce easy ways to have reactor-local variables (similar to thread-local variables), so C global variables can be replaced with reactor-local variables.

> I guess we have to trust C extensions to not do that?

I believe people. "Default unsafe" means I don't think it is easy to make safe.

> Probably we need to introduce easy ways to have reactor-local variables (similar to thread-local variables), so C global variables can be replaced with reactor-local variables.

Agreed.

> * `I'll fix most of builtin C-methods (String, Array, ...) thread-safe.` You mean just about synchronizing access to global variables, right? I guess in your model String/Array/Hash operations will not have any synchronization but rely on Reactor isolation, right?

As you said, most of String/Array operations are already thread-safe. However, for example fstring table is not thread-safe. Encoding table is also not thread-safe, and so on. The challenge is how to find such global resource accesses.

> * Does `Ractor::RemoteError` points to original error? How to avoid that exceptions "leak" objects from inside the Reactor? Maybe copy/move the exception? (The reactor can still be alive if it rescues e.g. `Ractor::MovedError`)

Copied one.
One exceptional (sorry confusing) case is the exception which terminate the source ractor.
It can be an optimization but not implemented.

> * About global variables, what about `$VERBOSE` and `$DEBUG` which are implicitly accessed? Should it still be allowed to read them in other reactors? Should they become Reactor-local? Should global variables be allowed to be read from any Reactor if they contain a shareable value?

Good question. Also `$0`, `$:`, `$stdin`, `$stdout`, ... are needed to consider. Now no idea. Should be Rator local? Some global variables are scope local (`$1`, `$2`, `$~`, ...), so I don't think such irregular scope is an issue.

Thanks,
Koichi

----------------------------------------
Bug #17100: Ractor: a proposal for new concurrent abstraction without thread-safety issues
https://bugs.ruby-lang.org/issues/17100#change-86924

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
# Ractor: a proposal for new concurrent abstraction without thread-safety issues

## Abstract

This ticket proposes a new concurrent abstraction named "Ractor", Ruby's 
Actor-like feature (not an exact Actor-model).

Ractor achieves the following goals:

* Parallel execution in a Ruby interpreter process
* Avoid thread-safety issues (especially race issues) by limiting the object sharing
* Communication via copying and moving

I'm working on this proposal in a few years, and the project name was 
"Guild". I renamed it from Guild to Ractor because of Matz's preference.

Resources:
* Proposed specification: https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md
* my talk
  * (latest, but written in Japanese) http://atdot.net/~ko1/activities/2020_ruby3summit.pdf
  * (old, API was changed) http://atdot.net/~ko1/activities/2018_rubykaigi2018.pdf
  * (old, API was changed) http://atdot.net/~ko1/activities/2018_rubyconf2018.pdf

Current implementation is not complete (many bugs are remaining) but it passes current CI.
I propose to merge it soon and try the APIs, continue to develop the implementation on a master branch.

## Background

MRI doesn't provide in-process parallel computation feature because 
parallel "Threads" has many issues.

* Ruby programmers need to consider about Thread-safety more.
* Interpreter developers need to consider about Thread-safety more.
* Interpreter will slow down in single thread execution because of fine-grain synchronization without clever optimizations.

The reason of these issues is "shared-everything" thread model.

## Proposal

To overcome the issues on multiple-threads, the Ractor abstraction is proposed.
This proposal consists of two-layers: memory model and communication model.

Basic:
* Introduce "Ractor" as new concurrent entity.
* Ractors run in parallel.

Memory-model:
* Separate "shareable" objects and "unshareable" objects between parallel running ractors.
   * Shareable-objects:
     * Immutable objects (frozen objects and only refer to shareable objects)
     * Class/module objects
     * Special shareable objects (Ractor objects, and so on)
   * Unshareable-objects: 
     * Other objects
* Most of objects are "unshareable", it means we (Ruby programmers and interpreter developers) don't need to care about thread-safety in many cases.
* We only concentrate to synchronize "shareable" objects.
* Compare with completely separating memory model (like MVM proposal), the programming will be easier ().
* This model is similar to Racket's `Place` abstraction.

Communication-model:
* Actor-like (not same) message passing with `Ractor#send(obj)` and `Ractor.recv`
* Pull-type communication with `Ractor.yield(obj)` and `Ractor#take`
* Support multiple waiting with `Ractor.select(...)`

Actor-like model is why we name this proposal "Ractor" (Ruby's actor). However now it is not an Actor model because we can't select the message (with pattern-match on Erlang, Elixir, ...). It means we can't has multiple communication channels. Instead of incomplete actor model, this proposal has `yield`/`take` pair to provide multiple channels. Discuss later about this topic.

I strongly believe memory model is promising.
However, I'm not sure the communication model is the best.
This is why I introduced "experimental" warning.

Proposed specification: https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md

## Implementation

https://github.com/ruby/ruby/pull/3365
All GH actions passes.

I describe implementation briefly.

### `rb_ractor_t`

Without Ractor, the VM-Thread-Fiber hierarchy is here:

* The VM `rb_vm_t` manages running threads (`rb_thread_t`).
* A thread (`rb_thread_t`) points a running fiber (`rb_fiber_t`).

With Ractor, we introduced new layer `rb_ractor_t`

* The VM `rb_vm_t` manages running ractors (`rb_ractor_t`).
* A Ractor manages running threads (`rb_thread_t`).
* A thread (`rb_thread_t`) points a running fiber (`rb_fiber_t`).

`rb_ractor_t` has GVL to manage threads (only one thread of Ractor's threads can run).

Ractor implementation is located in `ractor.h`, `ractor.c` and `ractor.rb`.

### VM-wide lock

VM-wide lock is introduced to protect VM global resources such as object-space and so on.
It should allow the recursive lock, so the implementation is monitor. So we may need to call it VM-wide monitor.
Now `RB_VM_LOCK_ENTER()` and `RB_VM_LOCK_LEAVE()` are provide to acquire/release the lock.

Note that it is different from (current) GVL.
GVL is acquired anytime you want to run the Ruby threads.
VM-wide lock is acquired only when accessing the VM-wide resources.

On single ractor mode (all Ruby scripts except my tests) 

### Object management and GC

* (1) All ractors share the object space.
* (2) All GC events will stop all ractors, and a ractor do GC work with barrier synchronization.
  * Barrier at `gc_enter()`
  * marking, (lazy) sweeping, ...
* (3) Because all the object space is shared by ractors, object creation is protected with VM-wide lock.

(2) and (3) has huge impact on performance.
The plans are:

* For (2), introduce (semi-)separated object space. It requires long time and Ruby 3.0 can't employ this technique.
* For (3), introduce free slot cache for the every ractor and most of creation can be done without synchronization. It will be employed soon.

### Experimental warning

Now Ractor implementation and specification is not stable. So that the first usage of `Ractor.new` will show a warning:

`warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.`

## Discussion

### Actor-based and channel-based

I think there are two message passing approach: Actor-based (Erlang, ...) and channel-based (Go, ...).

With channel-based, it is easy to manipulate multiple channels because it manages multiple channels explicitly. With Actor-based approach, to manipulate multiple channels with message pattern. Receiver can ignore/pending the unexpected structured message and can handle it after behavior is changed (role of actor is changed).

Ractor has `send/recv` like Actor-model, but there is no pattern matching feature. This is because we can't introduce new syntax and I can't design good API.

Channel-based approach, it is easy to design the API (for example, `ch = Ractor::Channel.new` and share the `ch` for ractors can provide). However I can't design good API to handle exceptions between Ractors.

To consider the error handling, we propose hybrid model using `send/recv`, `yield/take` pairs. `Ractor#take` can receive source ractor's exception (like `Thread#join`). On Actor approach, we can detect destination Ractor is not working (killed) when `Ractor#send(obj)`. Receiver ractor (waiting for `Ractor.recv`) can not detect sender's trouble, but maybe the priority is not a high. `Ractor#take` also detects sender's (`Ractor.yield(obj)`) error, so the error propagation can be done.

To handle multiple communication channels on Ractor, instead of using multiple channels, but use *pipe* ractors.

```
# worker-pool (receive by send)

main # pipe.send(obj)
-> pipe # Ractor.yield Ractor.recv
  ->
    worker1 # Ractor.yield(some_task pipe.take))
    worker2 # Ractor.yield(some_task pipe.take))
    worker3 # Ractor.yield(some_task pipe.take))
-> main # Ractor.select(worker1, worker2, worker3)

# if worker* causes an error, main can detect the error.
```

*pipe* ractors seems channel. However, we don't need to introduce new class with this technique (implementation can omit Ractor creation for pipe ractors).

Maybe there are other possibilities. For example, if we can propagate the errors with channels, we can also consider about channel-model (we need to change the Ractor name :p then).

### Name of Ractor (and Guild)

When I proposed Guild in 2016, I assume "move" message passing (see specification) is characteristic and I explain this feature "moving membership". This is why the name "Guild" was chosen. However Matz pointed out that this move semantics is not used frequently and he asked me to change the name. Also someone uses the class name "Guild".

"Ractor" is short and no existing class, this is why I choose "Ractor".

I understand people can confused with "Reactor".

## TODO

There are many remaining tasks.

### Protection

Many VM-wide (process-wide) resources are not protected correctly, so using Ractor on complicated program can cause critical bug (`[BUG]`). Most of global resource are managed by global variables, so that we need to check them correctly.

### C-methods

Now C-methods (methods written in C and defined with 
`rb_define_method()`) are run in parallel. It means thread-unsafe code 
can run in parallel. To solve this issue, I plan the following:

(1) Introduce thread-unsafe label for methods

It is impossible to make all C-methods thread-safe, especially for C-methods in 3rd party C-extensions. To protect them, label "thread-unsafe" for these (possible) thread-unsafe C-methods.

When "unsafe" labeled C methods are invoked, then acquire VM-wide lock. This VM-wide lock should care about recursive-ness (so this lock should be a monitor) and escaping (exceptions). Now, VM-wide lock doesn't care escaping, but it should be implemented soon.

(2) Built-in C-methods

I'll fix most of builtin C-methods (String, Array, ...) thread-safe.
If it is not easy, I'll use thread-unsafe label.

### Copying and moving

Now, Marshal protocol to make deep copy on message communication. However, Marshal protocol doesn't support some objects like `Ractor` objects, so we need to modify them.

Only a few types are supported for moving, so we need to write more.

### "GVL" naming

Now the source code contains the name "GVL", but they are Ractor local locks.
Maybe it should be renamed in source code.

### Performance

To introduce fine-grained lock, the performance tuning is needed.

### Bug fixes

many many ....

## Conclusion

This ticket proposes a new concurrent abstraction "Ractor".
I think Ruby 3 can ship with Ractor with "experimental" status.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>