Martin Bosslet <Martin.Bosslet / googlemail.com> wrote:
> >  I use an "-*- encoding: binary -*-" comment at the top of all Ruby
> >  source files where I initialize string literals for storing binary data.
> >  It's cleaner than setting Encoding::BINARY on every string I create
> >  (and nearly all my code works exclusively on binary data).
> 
> I'm afraid this had no effect, or I did it wrong, or I might also have 
> misunderstood you. The incoming string s already has UTF-8 encoding, so 
> 
>     @wbuffer << s
> 
> ends up as UTF-8 regardless of the encoding I set for the .rb file, I
> figured this was because "<<" calls rb_str_append which again calls
> rb_enc_check which will determine a compatible encoding, in this case
> UTF-8, for @wbuffer. But again, I might have misunderstood you.

You're right.  rb_str_append() modifies the empty @wbuffer to the
encoding of "s" above  :(

I suppose calling @wbuffer.force_encoding(Encoding::BINARY) after
@wbuffer is necessary (unless you write the buffering code in C like
io.c does).

> >  Also, all of the Ruby (non-SSL) *Socket objects have Encoding::BINARY by
> >  default anyways, so I think SSLSocket should be the same.
> 
> I'm sorry, I don't understand what you mean by the *Socket objects have
> binary encoding by default - do you mean it's binary data they are expecting
> to deal with for input and output? So a user would have to make sure to only
> pass already BINARY-encoded strings to any *Socket?

For all newly-created *Socket objects, external_encoding is already
ASCII-8BIT (binary) and the sockets should just pass the byte buffer of
any underlying String objects given to it.

> I quickly checked with a TCPServer and Net::HTTP client, there the aforementioned
> situation would work, when sending 100000 a-Umlauts you again receive the same
> amount, after enforcing the response to UTF-8 again, of course. That's why I
> thought that an SSLSocket should behave the same way. 

Yes, underlying IO#read/read_nonblock/sysread for the TCPSocket objects
should return new ASCII-8BIT Strings.  You needed to force them to UTF-8
yourself upon receipt.