Issue #1889 has been updated by Run Paint Run Run.


How does http://gist.github.com/170542 look? That's the categories from UnicodeData.txt, the scripts from Scripts.txt, and the POSIX character classes. (The new parser script is still at http://github.com/runpaint/onig/tree/master).

I have used http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt for definitions of the POSIX classes. One that stands out as wrong is [[:Cntrl:]]. By the definition in RE.txt it encompasses all members of the C category, but the current CR_C const is markedly different from the CR_Cntrl one. This is what I have ATM: 

    87   # TODO: Double check this definition. It appears to encompass the entire C
    88   # category, but currently the CR blocks for C and Cntrl are markedly different
    89   # cntrl    Control | Format | Unassigned | Private_Use | Surrogate
    90   data['Cntrl'] = data['Cc'] + data['Cf'] + data['Cn'] + data['Co'] +
    91                   data['Cs']

I'm defining Cn as any character in the Unicode range that does not appear in UnicodeData.txt. Any insights into how this class is defined?

There are 15 new scripts there, e.g. 'Vai'. These will need to be added to the '#ifdef USE_UNICODE_PROPERTIES' section, starting on line 10632, and the similar section starting on line 10507. For the former, what does the final digit in the row signify? For example, in the following what does 8 mean?

 { (UChar* )"Ethiopic",              69,  8 },
----------------------------------------
http://redmine.ruby-lang.org/issues/show/1889

----------------------------------------
http://redmine.ruby-lang.org