On Fri, Sep 25, 2009 at 6:43 PM, Brian Mitchell <binary42 / gmail.com> wrote:
> Now it has been pointed out that Fibers are currently really slow. It
> is kind of sad that the current implementation has these limitations
> but there is no reason that certain platforms could use much more
> efficient code paths for faster fiber operation. Examples of how big
> of a difference this can make can be seen in projects like LuaJIT [1].

Note that the "Coco" project you mention appears to use setjmp and
save the C stack, similar to Continuations and
Enumerator#next/Generators in 1.8 and 1.9 and Fiber in 1.9.

The only implementations of fast coroutines I have know of are those
that are greatly simplified (Python's) or that pass all state along so
there's no stack-hopping (several FP langs).

> The fact that garbage may be referenced is really a bad side effect of
> keeping the iterator around far too long in some scope. I think this
> is a problem of both iterations, though dealing with native threads is
> certainly going to make thread management a harder problem.

The problem is a little more involved, unfortunately. We need to make
sure that the thread running the enumeration doesn't strongly
reference the enumerator itself. If we can guarantee that, then when
the enumerator is dereferenced and collected the thread can be
canceled and shut down. But relying on GC to shut down those threads
is going to be problematic in itself, even if we can avoid
accidentally rooting the enumerator itself.

But you are right about one aspect...if the enumerator stays
referenced, the "fiber-like" entity remains referenced and should stay
alive, and anything it references on its execution stacks must remain
referenced. I smell lots of opportunities for leaks :(

> For JRuby I would consider using a combination of the noted options
> rather than just one. First, I would keep optimized Enumerator objects
> for native type. This should be doable with most common collections
> avoid threads and expensive context switching. We all love speed for
> the common case. Next, I would consider having a thread pool around
> for spinning up new iteration fibers for the cases of non-native
> streams. I have the feeling that enumerators will become more
> commonplace in Ruby code so this might be a good pattern to support
> (generator - consumer pairs are quite useful).

We can create specialized enumerators for every implementation in
JRuby, and will likely do so to avoid an explosion of threads if
people start using Enumerator#next a lot. But we have to do that work,
and it will take some time. Right now if you run JRuby and use
Enumerator#next you'll see threads spin up...and never go away. It's a
known issue that will probably remain for 1.4RC1, but it has to be
fixed before 1.4 final.

As far as Fibers go: in 1.9 mode we do have a pool, and will probably
do something similar for Enumerator#next generators soon. But it still
limits how many you can be doing at the same time, and we're at the
mercy of the GC as to when we'll shut down threads and release
resources associated with them.

> Longer term, I would find it interesting to discuss ways we can
> express multi-stream operations in a safe way that allow the
> implementation to perform intelligent fetching and fusion operations.
> One thought is a list comprehension form which can help avoid common
> problems in reducing dynamically dispatched calls.

I'm open to having this discussion, certainly. And I'll say it
again..I agree that the generator/enumerator/coroutine model is very
useful; but allowing it over arbitrarily complex iteration logic and
requiring all implementations to support or emulate delimited
continuations is a serious problem today.

- Charlie