Issue #10084 has been updated by Martin Dürst.

Assignee changed from Yukihiro Matsumoto to Martin Dürst

Not getting any feedback on implementation details, I'm assuming that nobody cares too much, and will therefore proceed. I have tried a refinement (proposal 5); I didn't see any effects on performance. But using a refinement would make it more difficult to backport this to earlier versions or make it available as a gem.

I'm therefore going to take the easiest way forward and use solution 1), with a module name of UnicodeNormalize (exactly corresponding to primary method name on string). If anybody still has comments, please don't hesitate todd them here, so that we can discuss them.

----------------------------------------
Feature #10084: Add Unicode String Normalization to String class
https://bugs.ruby-lang.org/issues/10084#change-49367

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: Martin Dürst
* Category: 
* Target version: Ruby 2.2.0
----------------------------------------
Unicode string normalization is a frequent operation when comparing or normalizing strings.

This should be available directly on the String class.

The proposed syntax is:

   'string'.normalize       # normalize 'string' according to NFC (most frequent on the Web)
   'string'.normalize :nfc  # normalize 'string' according to NFC; :nfd, :nfkc, :nfkd also usable
   'string'.nfc             # shorter variant, but maybe too many methods

There are several "unofficial" but convenient normalization variants that could be offered, e.g.:
                           
   'string'.normalize :mac  # use MacIntosh file system normalization variant

Implementations are already available in pure Ruby (easy for other Ruby implementations; e.g. eprun: https://github.com/duerst/eprun) and in C (unf, http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/)

---Files--------------------------------
Normalization.pdf (576 KB)


-- 
https://bugs.ruby-lang.org/