Hi, On Thu, 02 Oct 2008 01:03:13 +1000, Yukihiro Matsumoto <matz / ruby-lang.org> wrote: > Hi, > > In message "Re: [ruby-core:19013] Re: Encodings::default_internal patch" > on Mon, 29 Sep 2008 15:03:34 +0900, "Michael Selig" > <michael.selig / fs.com.au> writes: > > |To assist further, I'd also like to make a few suggestions for Ruby > itself: > | > |1. default_internal should always be set (like "default_external"). If > not > |specified, I suggest it default to the same value as "default_external" > - > |Note: not 100% backward compatible with 1.9.0. You can use mode "ext:-" > in > |IO if you really need to suppress transcoding on input. > > I believe default_internal should not be set (or set to nil) by > default. The nil for default_internal means no conversion from > external encoding, so I think it's quite similar to what you intended > above. I will add -U command line option to the interpreter which set > UTF-8 as default_internal. > > |2. default_internal should not be able to be set to a non-ASCII > compatible > |encoding (ensures compatability with ASCII string literals); > > Fair enough. I'd like to add this restriction and check. > > |3. IO#write and friends should be changed so that when writing a file > with > |an external encoding of ASCII-8BIT, that no transcoding be attempted - > ie: > |just write out the raw bytes. This will help with writing a file > |containing multiple or arbitrary encodings (you won't have to use > |force_encoding("ASCII-8BIT") all the time). > > Agreed. > Thank you for considering my suggestions, Matz. I hope you will also have time to look at the patch I posted last weekend. Other than the extra flags (-U & -L?) I think it implements what you are suggesting. I think it is perfectly reasonable to leave default_internal unset by default, which is what happens in the patch. That way there is no transcoding overhead when doing I/O in simple Ruby programs. However I think there has to be a way of the programmer specifying it in the code - you can't expect the user to supply the right flags. That is why I suggested the "internal_encoding" magic comment extension (and being able to say -E:XXX on the shebang line). The only problem I see with allowing default_internal to be unset is in libraries. What encoding should they expect in their parameters? Can they at least expect it to be ASCII-compatible? Do they need to have code to check compatibility? I was hoping to avoid that and put the responsibility on the program using the library. My further suggestion of setting default_internal to the same as default_external should at least ensure that there is no transcoding overhead on "locale-only" programs, plus it gives writters of libraries confidencve that their inputs are ascii-compatible and probably compatible with eacho other. Also I think that if default_internal *is* set there should be a less ugly way of over-riding it when opening a file than mode "r:ENC:ENC". That's why I suggested "r:ENC:-". This is implemented in the patch. Cheers Mike