Issue #14363 has been updated by hugopeixoto (Hugo Peixoto).

File each_grapheme_cluster_size_nil.patch added
File each_grapheme_cluster_size_real.patch added

Calculating the enumerator size here requires iterating through the whole text and do grapheme detection on all bytes, so I'm not sure what's the right approach.

I'm attaching two patches, one that makes it return `nil` and one that does the actual count. Both patches have tests attached.

----------------------------------------
Bug #14363: each_grapheme_cluster.size returns the wrong size
https://bugs.ruby-lang.org/issues/14363#change-71152

* Author: sos4nt (Stefan Schler)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin15]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Ruby 2.5 adds `String#each_grapheme_cluster` to enumerate the string's grapheme clusters:

```ruby
str = "a\u0300i\u0301"          #=> "ai"
str.each_grapheme_cluster.to_a  #=> ["a", "i"]
```

Unfortunately, the enumerator's `size` doesn't work as expected:

```ruby
str.each_grapheme_cluster.size  #=> 4
```

The source code reveals that it invokes `rb_str_each_char_size`, so it is equivalent to `each_char.size`:

```c
static VALUE
rb_str_each_grapheme_cluster(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
    return rb_str_enumerate_grapheme_clusters(str, 0);
}
```

If the grapheme enumerator's size cannot be calculated lazily, `each_grapheme_cluster.size` should return `nil` to indicate that.

---Files--------------------------------
each_grapheme_cluster_size_nil.patch (921 Bytes)
each_grapheme_cluster_size_real.patch (3.03 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>