On Fri, Sep 25, 2009 at 8:57 PM, Tanaka Akira <akr / fsij.org> wrote:
> I thought this slowness sometimes ago.
>
> STDIN.lines.next
> is basically same as
> STDIN.gets or raise StopIteration
> but it has overhead of Fiber.
>
> IO has already a method for external iteration method (gets)
> but IO#lines cannot use it.
>
> I think Enumerator.new (and possibly to_enum, enum_for)
> should be able to take an array consists of two method names
> for internal and external iteration as:
> Enumerator.new(obj, [method1, method2], *args)
> method1 is for internal iteration and method2 is for
> external iteration. The enumerator created by
> Enumerator.new uses method2 when next method is called if
> method2 is provided. obj.method2 should return an object
> for external iteration.

Yes, this is along the lines of what Evan Phoenix and I
discussed...providing a way to opt out of Fiber/Generator-based
external iteration.

I like your ideas...more below.

> If Enumerator.new provides such way to avoid Fiber, IO can
> use it as:
>
> class IO
> def lines
>  Enumerator.new self, [:each_line, :external_iterator_for_lines]
> end
>
> def to_enum(meth=:each, *args)
>  if meth == :each || meth == :each_line
>   super [:each_line, :external_iterator_for_lines], *args
>  else
>   super
>  end
> end

So if doing internal iteration, the first method (each_line) will be
called with a block. If doing external iteration, the second method
(external_iterator_for_lines) will be called to produce an Enumerator
that knows how to do a lightweight external iteration of lines.

>  def external_iterator_for_lines
>    o = Object.new
>    o.instance_variable_set(:@io, self)
>    def o.next
>      @io.gets or raise StopIteration
>    end
>    o
>  end
> end

I guess there would be an analog you could use in custom #each
implementations...

def each
  if block_given?
    normal each logic
  else
    Enumerator.new(self, [:each, :each_external])
  end
end

Yes, this is definitely a step in the right direction. But I wonder if
allowing people to do arbitrarily complex external iteration is worth
it? It's going to be slow as long as switching stacks is slow, and
it's always going to be a heavy operation on platforms where
stack-juggling isn't permitted (or is too hard to bother trying).

External enumeration is a feature Ruby should have;
arbitrarily-complex continuation-driven external iteration is more of
a problem than a solution.

>
> This idea may solve some part of JRuby problem if a class
> provides custom to_enum. If a class don't provide that or
> user specify unexpected method name for to_enum, it still
> problem though.

I think it's a problem for any implementation since the performance
characteristics of a continuation-based #next are atrocious. And yet
there it is, luring you to use it, even though it's slow on 1.8.7 and
1.9 and spins up a whole native thread on JRuby (and eventually on
IronRuby and maybe others).

And so it's still clear...I think external enumeration is a great
feature, but I think there should be a clearly-defined protocol for
opting *in* to Enumerator#next, not a protocol to have to follow to
opt *out* of poor performance or heavyweight enumerators as in JRuby.
Put simply, I think collections should be required to *choose* to use
a fiber to implement external iteration, rather than using fibers
automatically.

So I'm thinking that if you implement "each" but not your own
"enum_for" or "to_enum", #next shouldn't work. You can then choose to
implement it with a fiber or not.

(but at the very least we need a clearly-defined protocol for opting
out, since I suspect *most* users of external iteration will *have to*
opt out to get reasonable performance)

- Charlie