Thanks Martin .. I was looking for "Oniguruma" 2008/1/12 Martin Duerst <duerst / it.aoyama.ac.jp>: > At 02:38 08/01/11, Yukihiro Matsumoto wrote: > >Hi, > > > >In message "Re: Draft of some pages about encoding in Ruby 1.9" > > on Fri, 11 Jan 2008 02:22:08 +0900, "Ujwal Reddy Malipeddi" > ><ujwalic / gmail.com> writes: > > > >|I think the document assumes that the terminal/console supports > >|various encoding and the current terminal font has glyphs to represent > >|the characters > > > >Ruby does not handle glyphs nor fonts. > > > >|does 1.9 support other BOMs? > > > >No, Ruby does not handle BOM. They are evil. The only exception is > >UTF-8 BOM at the beginning of Ruby programs. > > > >|Encoding > >|* UTF-8 > > > >Yes. > > > >|* UTF-16 Big Endian > >|* UTF-16 Little Endian > >|* UTF-32 Big Endian > >|* UTF-32 Little Endian > > > >Yes, in the trunk. > > Conversions (String#encode) should be added over the weekend. > [well, after I have sorted through all the recent emails :-(] > > >|* UTF-7 > > > >Not yet, but possible. Ruby allows user defined encoding. > > This one is discouraged for quite a while, because it's > not really a character encoding, more something like base64. > But I guess eventually, somebody will implement at least > conversion from and to this beast. > > >|* UTF-EBCDIC > > > >Ruby programs must be in ASCII compatible encoding. The encoding > >itself can be supported, I guess. We've never tried non ASCII > >compatible encoding before. > > Same here, conversion might be implemented in a few months or years, > but don't expect that soon, and don't expect anything else. > > >|* SCSU > >|* BOCU-1 > > > >I don't know these. > > Both are in some way closer to compression methods than to > character encodings, but tailored for Unicode. They are very > definitely not suited for internal processing. Same answer as > just above. > > >|which version of Unicode is supported in 1.9? > > > >Ruby does not cover version sensitive area of Unicode (character > >repertoire etc) yet. It should be handled by external library, > >e.g. unicode gem. > > Not exactly true. Oniguruma supports a lot of Unicode properties. > All the data is in unicode.c (currently enc/unicode.c). > Something like the following should actually work, independent > of your local settings: > > ruby -e 'puts "\u3042" =~ /\p{Hiragana}/u' > 0 > > (U+3042 is Hiragana a (дв)). > > The tables in unicode.c are in a derived form that makes it rather > difficult to figure out which version they are based on, but a > rough comparison between > http://www.unicode.org/Public/UNIDATA/DerivedAge.txt > and init_code_range_array in enc/unicode.c makes Version 4.1.0 > the best guess. > > > Regards, Martin. > > > > > > > > #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University > #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst / it.aoyama.ac.jp > > > -- ~// Work is Worship. Work Smart :) //~