Issue #15485 has been updated by zverok (Victor Shepelev).


> Personally I like this style because it is very clear and explicit. Anyway.

Well, on its own as just 2 code lines it probably is.
But any realistic use-case I can see of is like, for example "take all lines before empty line (extract header)", and it will became like:

```ruby
# old split:
def extract_header(body)
  body.split("\n")
      .take_while { |ln| !ln.empty? }
      .map { |ln| ln.split(': ', 2) } # ....and so on
end

# new split:
def extract_header(body)
  header = []
  body.split("\n") { |ln| 
    break if ln.empty?
    header << ln
  }
  header.map { |ln| ln.split(': ', 2) }
end

# my proposal (or, of course, #to_enum)
def extract_header(body)
  body.split("\n", enumertor: true)
      .take_while { |ln| !ln.empty? }
      .map { |ln| ln.split(': ', 2) }
end
```

I understand not everybody is fond of "functional-first"/"chaining-first" approach, but it seems that Ruby's evolution is clearly heading this way, so recent introduction of the method that "just yields" seems a bit off-the-track to me.


> There is `Object#to_enum` for your use case.

Yeah, of course, that's nice! Though seems a bit like a "fix" for unfortunate API.
Though looking from another angle, it could be treated also as a natural thing to do: implement minimal "yielding" in a method, and use `to_enum` for more complicated cases...

----------------------------------------
Feature #15485: Refactor String#split
https://bugs.ruby-lang.org/issues/15485#change-76007

* Author: zverok (Victor Shepelev)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
In #4780, new "block form" of `#split` was introduced. It behaves this way:

```ruby
"several\nlong\nlines".split("\n") { |part| puts part if part.start_with?('l') }
# prints:
#   long
#   lines
# => "several\nlong\nlines"
```

Justification is stated as: "If the string is very long, and I only need to play with the split string one by one, this will not create a useless expensive array."

I understand the justification, but strongly believe that **implementation is unfortunate**. In the current implementation, the only way to "play with the split string one by one" is side-effect-full, like this:

```ruby
result = []
lines.split("\n") { |ln| result << ln if ln.match?(PATTERN) }
```

This is very unidiomatic and unlike most of other methods that accept both block and no-block forms (it is understandable as original ticket is 7 years old, community practices were pretty different back then). 

Our typical modern solution of the same problem is **enumerators**.

I propose redefining method as following:

```ruby
lines.split("\n") # => Array, calculated immediately
lines.split("\n", enumerator: true) # => Enumerator, yielding split results one by one
```

It will allow all kind of idiomatic processing without any intermediate Array creation, like:
```ruby
lines.split("\n", enumerator: true).take_while { |ln| ln == '__END__' }
lines.split("\n", enumerator: true).grep(PATTERN)
# ...and so on...
```

One more thing to note, that this call-sequence underlines "just an optimization" nature of the change: When you have "too large string" to process, you just add `enumerator: true` to your code without changing anything else.

PS: We can't change `split` to return enumerator **always**, because it would break a lot of sane code like `lines.split("\n").join("\r\n")`




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>