Issue #10084 has been updated by Martin Drst.


 Hello Yui,
 
 On 2014/10/21 16:34, naruse / airemix.jp wrote:
 > Issue #10084 has been updated by Yui NARUSE.
 >
 >
 >>    class Unicode < self
 >>      def self.download(name, *rest)
 >>        super("http://www.unicode.org/Public/UCD/latest/ucd/#{name}", name, *rest)
 >>      end
 >>    end
 >
 > "latest" is not acceptable because released Ruby's table must be a specific version.
 
 [I disagree with this policy, but I will of course respect it until I 
 can convince others that a more dynamic policy is better.]
 
 > Moreover generated lib/unicode_normalize/tables.rb is only 200MB. How about committing it to the repo like other conversion tables?
 
 I came to the same conclusion, and I have just done so at r48072.
 
 Nobu and I have tried to make the update of the Unicode data files 
 automatic and unobtrusive, but we had to find out that it is difficult 
 to get all of the following:
 - Use already downloaded Unicode data files if no network connection.
 - Check for updates dynamically.
 - Make sure that this happens regularly (I think currently it is done
    with "make up", but not everybody packaging Ruby is using "make up").
 
 I hope we can try to keep the makefile logic for automatic update of 
 Unicode data files and lib/unicode_normalize/tables.rb but change it so 
 that it is triggered only on request.
 
 Regards,   Martin.
 
 > ----------------------------------------
 > Feature #10084: Add Unicode String Normalization to String class
 > https://bugs.ruby-lang.org/issues/10084#change-49562
 >
 > * Author: Martin Drst
 > * Status: Assigned
 > * Priority: Normal
 > * Assignee: Martin Drst
 > * Category:
 > * Target version: Ruby 2.2.0
 > ----------------------------------------
 > Unicode string normalization is a frequent operation when comparing or normalizing strings.
 >
 > This should be available directly on the String class.
 >
 > The proposed syntax is:
 >
 >     'string'.normalize       # normalize 'string' according to NFC (mostfrequent on the Web)
 >     'string'.normalize :nfc  # normalize 'string' according to NFC; :nfd, :nfkc, :nfkd also usable
 >     'string'.nfc             # shorter variant, but maybe too many methods
 >
 > There are several "unofficial" but convenient normalization variants that could be offered, e.g.:
 >
 >     'string'.normalize :mac  # use MacIntosh file system normalization variant
 >
 > Implementations are already available in pure Ruby (easy for other Ruby implementations; e.g. eprun: https://github.com/duerst/eprun) and in C (unf,กฤ, http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/)
 >
 > ---Files--------------------------------
 > Normalization.pdf (576 KB)
 >
 >

----------------------------------------
Feature #10084: Add Unicode String Normalization to String class
https://bugs.ruby-lang.org/issues/10084#change-49565

* Author: Martin Drst
* Status: Assigned
* Priority: Normal
* Assignee: Martin Drst
* Category: 
* Target version: Ruby 2.2.0
----------------------------------------
Unicode string normalization is a frequent operation when comparing or normalizing strings.

This should be available directly on the String class.

The proposed syntax is:

   'string'.normalize       # normalize 'string' according to NFC (most frequent on the Web)
   'string'.normalize :nfc  # normalize 'string' according to NFC; :nfd, :nfkc, :nfkd also usable
   'string'.nfc             # shorter variant, but maybe too many methods

There are several "unofficial" but convenient normalization variants that could be offered, e.g.:
                           
   'string'.normalize :mac  # use MacIntosh file system normalization variant

Implementations are already available in pure Ruby (easy for other Ruby implementations; e.g. eprun: https://github.com/duerst/eprun) and in C (unf,กฤ, http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/)

---Files--------------------------------
Normalization.pdf (576 KB)


-- 
https://bugs.ruby-lang.org/