Issue #13220 has been updated by mihao (Micha Kosek).


Most of these test failures are caused by Ruby operating on code points, not grapheme clusters. There are more and more characters that are only expressed by several code points, and they are not limited to obscure cases, such as "q". For example, country flags use two code points, which leads to unexpected results:

"".reverse # => 

Normalisation won't help here; we need grapheme clusters:

"".grapheme_clusters.reverse.join # => 

Please consider making string functions and regexes operate on grapheme clusters by default. That's what users want 99% of the time. Code points are hardly ever a useful unit. For example, a user may want to know the number of grapheme clusters or the number of bytes in a string, but it's hard to find a scenario where it's important to know that "" consists of four code points.

By the way, string operations in Swift don't make such surprises: String("".reversed()) # => 


----------------------------------------
Bug #13220: Enhance support of Unicode strings manipulation
https://bugs.ruby-lang.org/issues/13220#change-78574

* Author: r.smitala (Radovan Smitala)
* Status: Feedback
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-darwin16]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
Hi,

last days, Starr Horne posted very interesting testing results about manipulation unicode strings in Ruby 2.4.
And many methods doesn't work as excepted.

Article:

http://blog.honeybadger.io/ruby-s-unicode-support/



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>