Thanks Martin .. I was looking for "Oniguruma"

2008/1/12 Martin Duerst <duerst / it.aoyama.ac.jp>:
> At 02:38 08/01/11, Yukihiro Matsumoto wrote:
> >Hi,
> >
> >In message "Re: Draft of some pages about encoding in Ruby 1.9"
> >    on Fri, 11 Jan 2008 02:22:08 +0900, "Ujwal Reddy Malipeddi"
> ><ujwalic / gmail.com> writes:
> >
> >|I think  the document assumes that the terminal/console supports
> >|various encoding and the current terminal font has glyphs to represent
> >|the characters
> >
> >Ruby does not handle glyphs nor fonts.
> >
> >|does 1.9 support other BOMs?
> >
> >No, Ruby does not handle BOM.  They are evil.  The only exception is
> >UTF-8 BOM at the beginning of Ruby programs.
> >
> >|Encoding
> >|* UTF-8
> >
> >Yes.
> >
> >|* UTF-16 Big Endian
> >|* UTF-16 Little Endian
> >|* UTF-32 Big Endian
> >|* UTF-32 Little Endian
> >
> >Yes, in the trunk.
>
> Conversions (String#encode) should be added over the weekend.
> [well, after I have sorted through all the recent emails :-(]
>
> >|* UTF-7
> >
> >Not yet, but possible.  Ruby allows user defined encoding.
>
> This one is discouraged for quite a while, because it's
> not really a character encoding, more something like base64.
> But I guess eventually, somebody will implement at least
> conversion from and to this beast.
>
> >|* UTF-EBCDIC
> >
> >Ruby programs must be in ASCII compatible encoding.  The encoding
> >itself can be supported, I guess.  We've never tried non ASCII
> >compatible encoding before.
>
> Same here, conversion might be implemented in a few months or years,
> but don't expect that soon, and don't expect anything else.
>
> >|* SCSU
> >|* BOCU-1
> >
> >I don't know these.
>
> Both are in some way closer to compression methods than to
> character encodings, but tailored for Unicode. They are very
> definitely not suited for internal processing. Same answer as
> just above.
>
> >|which version of Unicode is supported in 1.9?
> >
> >Ruby does not cover version sensitive area of Unicode (character
> >repertoire etc) yet.  It should be handled by external library,
> >e.g. unicode gem.
>
> Not exactly true. Oniguruma supports a lot of Unicode properties.
> All the data is in unicode.c (currently enc/unicode.c).
> Something like the following should actually work, independent
> of your local settings:
> > ruby -e 'puts "\u3042" =~ /\p{Hiragana}/u'
> 0
>
> (U+3042 is Hiragana a (дв)).
>
> The tables in unicode.c are in a derived form that makes it rather
> difficult to figure out which version they are based on, but a
> rough comparison between
> http://www.unicode.org/Public/UNIDATA/DerivedAge.txt
> and init_code_range_array in enc/unicode.c makes Version 4.1.0
> the best guess.
>
>
> Regards,    Martin.
>
>
>
>
>
>
>
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp
>
>
>



-- 
~// Work is Worship. Work Smart :) //~