Issue #18262 has been updated by knu (Akinori MUSHA).


I agree this would be a good addition, and I think the existing users of `lazy` would understand the incompatibility this would bring is a necessary step to make `partition` more useful.

However, the buffering could be a pitfall for new users.   In today's developer meeting, Matz and I agreed to suggest that the behavior should be well documented.  If you were dividing a huge (or infinite) list into two where one enumerator would yield a value extremely less likely than the other, the buffer could become huge.  That is not straightforward from what you normally expect from "lazy", so it should be noted in the documentation.

----------------------------------------
Feature #18262: Enumerator::Lazy#partition
https://bugs.ruby-lang.org/issues/18262#change-94707

* Author: zverok (Victor Shepelev)
* Status: Open
* Priority: Normal
----------------------------------------
(Part of my set of proposals about making `.lazy` more useful/popular.)

Currently:
```ruby
file = File.open('very-large-file.txt')
lines_with_errors, lines_without_errors = file.lazy.partition { _1.start_with?('E:') }
lines_with_errors.class
# => Array, all file is read by this moment
```
This might be not very practical performance-wise and memory-wise.

I am thinking that maybe returning a pair of lazy enumerators might be a good addition to `Enumerator::Lazy`

Naive prototype:

```ruby
class Enumerator::Lazy
  def partition(&block)
    buffer1 = []
    buffer2 = []
    source = self

    [
      Enumerator.new { |y|
        loop do
          if buffer1.empty?
            begin
              item = source.next
              if block.call(item)
                y.yield(item)
              else
                buffer2.push(item)
              end
            rescue StopIteration
              break
            end
          else
            y.yield buffer1.shift
          end
        end
      }.lazy,
      Enumerator.new { |y|
        loop do
          if buffer2.empty?
            begin
              item = source.next
              if !block.call(item)
                y.yield(item)
              else
                buffer1.push(item)
              end
            rescue StopIteration
              break
            end
          else
            y.yield buffer2.shift
          end
        end
      }.lazy
    ]
  end
end
```
Testing it:
```ruby
Enumerator.produce(1) { |i| puts "processing #{i}"; i + 1 }.lazy
  .take(30)
  .partition(&:odd?)
  .then { |odd, even|
    p odd.first(3), even.first(3)
  }
# Prints:
# processing 1
# processing 2
# processing 3
# processing 4
# processing 5
# [1, 3, 5]
# [2, 4, 6]
```
As you might notice by the "processing" log, it only fetched the amount of entries that was required by produced enumerators.

The **drawback** would beĦ½as my prototype implementation showsĦ½the need of internal "buffering" (I don't think it is possible to implement lazy partition without it), but it still might be worth a shot?



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>