Issue #10084 has been updated by Benoit Daloze.


Yui NARUSE wrote:
> >   class Unicode < self
> >     def self.download(name, *rest)
> >       super("http://www.unicode.org/Public/UCD/latest/ucd/#{name}", name, *rest)
> >     end
> >   end
> 
> "latest" is not acceptable because released Ruby's table must be a specific version.
> 
> Moreover generated lib/unicode_normalize/tables.rb is only 200MB. How about committing it to the repo like other conversion tables?

You probably meant 200 KB.

----------------------------------------
Feature #10084: Add Unicode String Normalization to String class
https://bugs.ruby-lang.org/issues/10084#change-49567

* Author: Martin Dürst
* Status: Assigned
* Priority: Normal
* Assignee: Martin Dürst
* Category: 
* Target version: Ruby 2.2.0
----------------------------------------
Unicode string normalization is a frequent operation when comparing or normalizing strings.

This should be available directly on the String class.

The proposed syntax is:

   'string'.normalize       # normalize 'string' according to NFC (most frequent on the Web)
   'string'.normalize :nfc  # normalize 'string' according to NFC; :nfd, :nfkc, :nfkd also usable
   'string'.nfc             # shorter variant, but maybe too many methods

There are several "unofficial" but convenient normalization variants that could be offered, e.g.:
                           
   'string'.normalize :mac  # use MacIntosh file system normalization variant

Implementations are already available in pure Ruby (easy for other Ruby implementations; e.g. eprun: https://github.com/duerst/eprun) and in C (unf, http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/)

---Files--------------------------------
Normalization.pdf (576 KB)


-- 
https://bugs.ruby-lang.org/