On Fri, Apr 1, 2011 at 4:57 PM, Eric Wong <normalperson / yhbt.net> wrote:
> Charles Nutter <headius / headius.com> wrote:
>> I wonder, though, if depending on this behavior is leading Ruby more
>> and more down the GVL path. The designers of the JVM's core IO
>> libraries, for example, were unable to reconcile concurrent native
>> threads with interruptible IO, due to the impossibility of knowing
>> what state all IO-related data structures are in when the thread is
>> interrupted.
>
> I don't think so, even if threads are interrupted they're resumed after
> the signal handler is done (or the process is dying anyways and we don't
> care). If the interrupt is to raise an exception then that could get
> messy[1], but for the general case of signal handlers it's not an issue.

I'm speaking specifically of Thread#raise and Thread#kill, which if
used to interrupt a thread could potentially leave the IO channel in
an unknown state (due to interrupting during a system call). On the
JVM, all process-level signals are handled by a separate thread, so
they are never run on user threads and that's not a concern for us.
JRuby has real concurrent threads, so regardless of what blocking
calls we make other threads will continue to run (i.e. we have no need
for BLOCKING_REGION-tyle GVL logic). So ultimately it's only being
able to kill or raise in an arbitrary thread that led us to make JRuby
IO logic use selection to get around the effects of interrupting
blocking IO calls I mentioned below.

Long story short, how does MRI guarantee that the underlying IO is in
a reliable state when the thread accessing it can be interrupted
permanently? It seems like doing most blocking at a consistent point
(like a select call) is safer.

And I am mostly just trying to understand how it's consistently safe
to interrupt a system-level IO call.

>> As a result, IO channels performing blocking operations
>> are explicitly closed when the thread they block is interrupted.
>
> That is terrible. I'd never touch a platform that does that.

Well, I tend not to touch platforms that expose or depend on specific
platform details, like MRI does in *many* places (and now more places
with your patch, I think). I like my code to work the same on all
platforms.

That said, I admit it's inconvenient, but I understand the reasoning.
You have to understand the JVM is trying to smooth over the
platform-specific details of IO across lots of platforms, many of them
not POSIX. If you can't guarantee to user code what the state of an IO
channel will be when interrupting system-level code, it's a pretty
clean option to say "don't do that, or we'll close the stream" and
point users toward a safely interruptible option like select.

We've managed to work with that situation and mostly emulate MRI's IO
behavior, so in practice it's more a nuisance than anything else.

> If there are cross-platform concerns, the functions that wrap select()
> should be made no-op on platforms where select() is not needed (on
> all POSIX-like ones, I expect) and not interfere with platforms where
> they're not needed.
>
> Regardless, there'll always be a set of IO operations that can never be
> interrupted. That doesn't bother me at all since the rest of the VM
> still runs. I'd rather just not use select()/poll() at all for
> "blocking" I/O calls.

That seems good on the surface, but it's depending on those blocking
operations having consistent state after being interrupted across
platforms. That seems like it would be easier to guarantee at a
"select" level, but I admit I'm trying to understand if that's true.
If you can't guarantee that the underlying IO channels are in a
consistent state (ideally the *same* state regardless of platform)
then writing to IO becomes a bunch of platform-specific checks in user
code just like you'd have to write in C. The structure of Ruby's APIs
has always been to provide a reasonably consistent view of
system-level APIs so you don't have to do that.

>> I also wonder if there's a race condition here; is it not possible
>> that the interrupt of a thread would fire immediately after the GVL
>> has been released but before the blocking IO operation has fired?
>> Perhaps I'm birdwalking too deep into the vagaries of MRI's IO logic.
>
> So a signal handler might fire and the syscall would just continue and
> not fail with EINTR. No big deal, it'll just finish the syscall before
> checking for interrupts.

Except that you've now fired your Thread#kill or Thread#raise and the
thread is never going to see it. If the contract of kill and raise is
that "we'll try to kill or raise in the target thread, but no
guarantees if it will do anything at all" I'm fine with that, but that
hasn't been the expectation of Ruby users up to now. I'm not sure if
this is actually a problem or not...MRI's cross-thread event behavior
is rather involved.

> The real race condition is relying on select()/poll() at all for
> readability. select()/poll() returning success _never_ guarantees an
> operation won't block due to spurious wakeups and shared IO across
> multiple threads/processes.

That's certainly true, but any code using select would not just
blindly proceed to a blocking call after wakeup...it would check that
the IO channel is actually ready, and if not go into select again. I
don't see how that makes the consistency and reliability of blocking
on selection less attractive than interrupting arbitrary kernel-level
calls.

- Charlie