Hi,

On Thu, 02 Oct 2008 01:03:13 +1000, Yukihiro Matsumoto  
<matz / ruby-lang.org> wrote:

> Hi,
>
> In message "Re: [ruby-core:19013] Re: Encodings::default_internal patch"
>     on Mon, 29 Sep 2008 15:03:34 +0900, "Michael Selig"  
> <michael.selig / fs.com.au> writes:
>
> |To assist further, I'd also like to make a few suggestions for Ruby  
> itself:
> |
> |1. default_internal should always be set (like "default_external"). If  
> not
> |specified, I suggest it default to the same value as "default_external"  
> -
> |Note: not 100% backward compatible with 1.9.0. You can use mode "ext:-"  
> in
> |IO if you really need to suppress transcoding on input.
>
> I believe default_internal should not be set (or set to nil) by
> default.  The nil for default_internal means no conversion from
> external encoding, so I think it's quite similar to what you intended
> above.  I will add -U command line option to the interpreter which set
> UTF-8 as default_internal.
>
> |2. default_internal should not be able to be set to a non-ASCII  
> compatible
> |encoding (ensures compatability with ASCII string literals);
>
> Fair enough.  I'd like to add this restriction and check.
>
> |3. IO#write and friends should be changed so that when writing a file  
> with
> |an external encoding of ASCII-8BIT, that no transcoding be attempted -  
> ie:
> |just write out the raw bytes. This will help with writing a file
> |containing multiple or arbitrary encodings (you won't have to use
> |force_encoding("ASCII-8BIT") all the time).
>
> Agreed.
>

Thank you for considering my suggestions, Matz.
I hope you will also have time to look at the patch I posted last weekend.
Other than the extra flags (-U & -L?) I think it implements what you are  
suggesting.

I think it is perfectly reasonable to leave default_internal unset by  
default, which is what happens in the patch. That way there is no  
transcoding overhead when doing I/O in simple Ruby programs. However I  
think there has to be a way of the programmer specifying it in the code -  
you can't expect the user to supply the right flags. That is why I  
suggested the "internal_encoding" magic comment extension (and being able  
to say -E:XXX on the shebang line).

The only problem I see with allowing default_internal to be unset is in  
libraries. What encoding should they expect in their parameters? Can they  
at least expect it to be ASCII-compatible? Do they need to have code to  
check compatibility? I was hoping to avoid that and put the responsibility  
on the program using the library. My further suggestion of setting  
default_internal to the same as default_external should at least ensure  
that there is no transcoding overhead on "locale-only" programs, plus it  
gives writters of libraries confidencve that their inputs are  
ascii-compatible and probably compatible with eacho other.

Also I think that if default_internal *is* set there should be a less ugly  
way of over-riding it when opening a file than mode "r:ENC:ENC". That's  
why I suggested "r:ENC:-". This is implemented in the patch.

Cheers
Mike