Issue #13780 has been updated by rbjl (Jan Lelis).


Great to see this implemented!

One tiny thing I've noticed:
- For non-Unicode strings, `\X` will still match "\r\n" as a single grapheme. This should probably also be the case with `String#each_grapheme` - or the difference should be clearly documented

----------------------------------------
Feature #13780: String#each_grapheme
https://bugs.ruby-lang.org/issues/13780#change-66171

* Author: rbjl (Jan Lelis)
* Status: Assigned
* Priority: Normal
* Assignee: naruse (Yui NARUSE)
* Target version: 2.5
----------------------------------------
Ruby's regex engine has support for graphemes via `\X`:

https://github.com/k-takata/Onigmo/blob/791140951eefcf17db4e762e789eb046ea8a114c/doc/RE#L117-L124

This is really useful when working with Unicode strings. However, code like `string.scan(/\X/)` is not so readable enough, which might lead people to use String#each_char, when they really should split by graphemes.

What I propose is two new methods:

- String#each_grapheme which returns an Enumerator of graphemes (in the same way like `\X`)

and 

- String#graphemes which returns an Array of graphemes (in the same way like `\X`)

What do you think?

Resources

- Unicode Standard Annex #29: Unicode Text Segmentation: http://unicode.org/reports/tr29/
- Related issue: https://bugs.ruby-lang.org/issues/12831




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>