Issue #15446 has been updated by sawa (Tsuyoshi Sawada).


I would rather propose to have `String#scan` take an optional second argument that is comparable to the optional second argument `capture` of `String#[]` after a regexp argument:

```ruby
r = /\b([a-z]|([a-z])[a-z]*\1)\b/i
str[r] # => "Viv"
str[r, 0] # => "Viv"
str[r, 1] # => "Viv"
str[r, 2] # => "V"
```

so that it should work like this:

```ruby
str.scan(r) # => [["Viv", "V"], ["Bob", "B"], ["Bob", "B"], ["Eve", "E"], ["a", nil], ["Eve", "E"], ["Bob", "B"], ["a", nil], ["regular", "r"]]
str.scan(r, 0) # => ["Viv", "Bob", "Bob", "Eve", "a""Eve", "Bob", "a", "regular"]
str.scan(r, 1) # => ["Viv", "Bob", "Bob", "Eve", "a""Eve", "Bob", "a", "regular"]
str.scan(r, 2) # => ["V", "B", "B", "E", nil, "E", "B", nil, "r"]
```


----------------------------------------
Feature #15446: Add a method `String#each_match` to the Ruby core
https://bugs.ruby-lang.org/issues/15446#change-76679

* Author: CaryInVictoria (Cary Swoveland)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
`String#each_match` would have two forms:

*each_match(pattern) { |match| block } вк str*
*each_match(pattern) вк an_enumerator*

The latter would be identical to the form *gsub(pattern) вк enumerator* of [String#gsub](http://ruby-doc.org/core-2.5.1/String.html#method-i-gsub). The former would simply yield the matches to a block and return the receiver.

I frequently use the form of `gsub` that returns an enumerator instead of `scan` when chaining to Enumerable methods. That's because `scan` returns an unneeded temporary array. This use of `gsub` can also be useful when the pattern contains capture groups, which can be a complication when using `scan`, as in the following example

Suppose we are given a string and wish to count the number of occurrences of each word that begins and ends with the same letter (case-insensitive).

     str = "Viv and Bob are party animals. Bob and Eve are a couple who met on Christmas Eve. Bob is a regular guy."

     r = /\b(?:[a-z]|([a-z])[a-z]*\1)\b/i

This regular expression reads, "match a word break, followed by one letter or by two or more letters with the last matching the first (case insensitive), all followed by a word break".

     enum = str.each_match(r)
        #=> #<Enumerator: "Viv and Bob are party...a regular guy.":gsub(/\b(?:[a-z]|([a-z])[a-z]*\1)\b/i)> 
 
We can convert `enum` to an array to see the words that will be generated by the enumerator and passed to the block.

    enum.to_a
        #=> ["Viv", "Bob", "Bob", "Eve", "a", "Eve", "Bob", "a", "regular"] 

Continuing, 

    enum.each_with_object(Hash.new(0)) { |word, h| h[word] += 1 }
       #=> {"Viv"=>1, "Bob"=>3, "Eve"=>2, "a"=>2, "regular"=>1} 

We could alternatively use `each_match` with a block.

     h = Hash.new(0)
     str.each_match(r) { |word| h[word] += 1 }
        #=> "Viv and Bob are party animals. Bob and Eve are a couple who met on Christmas Eve. Bob is a regular guy."
     h #=> {"Viv"=>1, "Bob"=>3, "Eve"=>2, "a"=>2, "regular"=>1} 

This form of `each_match` has no counterpart with `gsub`.

Consider now how `scan` would be used here. Because of the way `scan` treats capture groups, we cannot write

    str.scan(r)
       #=> [["V"], ["B"], ["B"], ["E"], [nil], ["E"], ["B"], [nil], ["r"]] 

Instead we must add a second capture group.

    arr = str.scan(/\b((?:[a-z]|([a-z])[a-z]*\2))\b/i)
       #=> [["Viv", "V"], ["Bob", "B"], ["Bob", "B"], ["Eve", "E"], ["a", nil], ["Eve", "E"], ["Bob", "B"], ["a", nil], ["regular", "r"]]

Then

    arr.each_with_object(Hash.new(0)) { |(word,_),h| h[word] += 1 }
       #=> {"Viv"=>1, "Bob"=>3, "Eve"=>2, "a"=>2, "regular"=>1}

This works but it's a bit of a [dog's breakfast](https://dictionary.cambridge.org/us/dictionary/english/a-dog-s-breakfast) when compared to the use of the proposed method.

The problem with using `gsub` in this way is that it is confusing to readers who are expecting character substitutions to be performed. I also believe that the name of this method (the "sub" in `gsub`) has resulted in the form of the method that returns an enumerator to be under-appreciated and under-used.

Some comments below propose that this suggestion be adopted and, in time, the form of `gsub` that returns an enumerator be deprecated.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>