Bob Hutchison wrote:


>>IMHO, default encoding of XML parser in Ruby should be UTF-8.
>>Because XML is in Unicode world, not ISO-8859-* nor EUC world
>>(unfortunately for me). And Ruby's regex doesn't support
>>UTF-16.
>>So, if the parser support only one encoding, it should be UTF-8,
>>and documents in other encoding should be converted to UTF-8.
>>
>>Is it good solution?
>>
> 
> No I don't think so. How you represent the character stream internally is
> entirely up to you (immediate *internal* conversion to UTF-8 by your parser
> is OK). Restricting input to UTF-8 will place an impossible to live with
> constraint on the use of your parser. Presumably having an XML parser is to
> allow ruby programs to participate in a larger context -- and this larger
> context isn't going to provide encoding conversions.


http://www.w3.org/TR/REC-xml.html :
http://www.w3.org/TR/REC-xml.html#charencoding :
"All XML processors must be able to read entities in both the UTF-8 and 
UTF-16 encodings."

Tobi



-- 
Tobias Reif
http://www.pinkjuice.com/myDigitalProfile.xhtml

go_to('www.ruby-lang.org').get(ruby).play.create.have_fun
http://www.pinkjuice.com/ruby/