Issue #17664 has been updated by ciconia (Sharon Rosner).


> I missed some functions in io.c which could invoke read. That's why read was showing up.

Sorry, this tripped me up and I was looking for a corresponding `write` line.

> I propose the following changes:
> 
> - `DirectScheduler` makes sense for general IO including blocking IO. `IO#read` and `IO#write` should invoke the fiber scheduler `io_read` and `io_write` respectively. This enables things like non-blocking read/write to `stdin` and `stdout` without making them `O_NONBLOCK`. `io_uring` supports this directly, while `epoll` and `kqueue` will need to use a `fcntl` wrapper.
> - `Socket#read` and `Socket#write` should be implemented via a different scheduler hook, maybe `socket_read` and `socket_write`, to go along with what will eventually include` socket_recvmsg` and `socket_sendmsg` etc. The implementation of `socket_read` and `socket_write` **could** be the same as `io_read` and `io_write`, but for performance reasons should use `read|write -> EAGAIN -> polling` instead.
> - We might need to check the most efficient way to deal with pipes, I suspect they are more similar to sockets internally than files.

This seems fine to me, but to play the devil's advocate you are making a design decision which is based on:

- Anecdotal benchmarks - the performance difference you see might be reversed in different circumstances.
- A fiber scheduler implementation that is *external* to Ruby.

Another point I wanted to bring up is that if you are indeed going to implement this kind of behavior in a fiber scheduler then the fiber switching becomes non-deterministic. This has ramifications for the behavior of user programs, in two important ways:

- You will not be able to tell whether a fiber switch happens on calling `IO#read` et al, which might lead to difficulties in debugging.
- Cancelling an I/O operation where there's no fiber switch happening becomes impossible. Cancellation is a whole subject in itself, it is not addressed at all by the current fiber scheduler spec, and IMO is a *crucial* aspect of managing concurrency.

I'll just give a very simple example (I don't know how `Fiber#raise` interacts with the fiber scheduler mechanism, if at all, but let's suppose the fiber scheduler knows how to deal with that):

```ruby
f1 = Fiber.schedule do
  @io.write('foo') # is a ctx switch happening here?
  puts 'oh hi' # or here?
rescue
  @some_other_io.puts 'oh bye' # or here?
end

f2 = Fiber.schedule do
  f1.raise
end
```

With your proposition, the output of the above program will change according to the fiber scheduler implementation and whether `@io` is a socket or file or something else.

I think it would be better to *always* do a fiber switch on any I/O. That's what I do for example in the [Polyphony libev backend](https://github.com/digital-fabric/polyphony/blob/70f616cbeab5037710d2cceaddc0205ab770a861/ext/polyphony/backend_libev.c#L292-L303): if the read was immediately successful, the fiber snoozes (schedules itself and yields control to some other fiber). Deterministic behavior is IMO one of the main advantages of using fibers compared to threads.

As I wrote on one of the relevant GitHub issues, I think a design document that describes the behavior of fiber schedulers in detail, and also addresses some of the "harder" aspects of concurency - error handling, cancellation, determinism, composability - might be beneficial, at the very least as a guiding star for fiber scheduler implementations.

----------------------------------------
Bug #17664: Behavior of sockets changed in Ruby 3.0 to non-blocking
https://bugs.ruby-lang.org/issues/17664#change-92873

* Author: ciconia (Sharon Rosner)
* Status: Assigned
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* ruby -v: 3.0.0
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
I'm not sure this is a bug, but apparently a change was introduced in Ruby 3.0 that makes sockets non-blocking by default. This change was apparently introduced as part of the work on the [FiberScheduler interface](https://github.com/ruby/ruby/blame/78f188524f551c97b1a7a44ae13514729f1a21c7/ext/socket/init.c#L411-L434). This change of behaviour is not discussed in the Ruby 3.0.0 release notes.

This change complicates the implementation of an io_uring-based fiber scheduler, since io_uring SQE's on fd's with `O_NONBLOCK` can return `EAGAIN` just like normal syscalls. Using io_uring with non-blocking fd's defeats the whole purpose of using io_uring in the first place.

A workaround I have put in place in the Polyphony [io_uring backend](https://github.com/digital-fabric/polyphony/blob/d3c9cf3ddc1f414387948fa40e5f6a24f68bf045/ext/polyphony/backend_io_uring.c#L28-L47) is to make sure `O_NONBLOCK` is not set before attempting I/O operations on any fd.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>