Issue #9028 has been updated by MartinBosslet (Martin Bosslet).

Assignee changed from MartinBosslet (Martin Bosslet) to drbrain (Eric Hodel)

Sounds great. As long as there is no explicit encoding support for sockets, ASCII-8BIT also seems the most natural to me. Thank you and please go ahead, Eric!
----------------------------------------
Bug #9028: Make SSLSocket Support Encodings
https://bugs.ruby-lang.org/issues/9028#change-43375

Author: whitehat101 (Jeremy Ebler)
Status: Assigned
Priority: Normal
Assignee: drbrain (Eric Hodel)
Category: ext/openssl
Target version: 
ruby -v: 1.9.3, 2.0.0-p0
Backport: 1.9.3: DONTNEED, 2.0.0: REQUIRED


I was working on a bug in the xmpp4r project that caused REXML exceptions when receiving UTF-8 Strings.
https://github.com/xmpp4r/xmpp4r/issues/13

The issue ended up being that SSLSocket#readline didn't always return strings with the same encoding. It gave plain ASCII strings an encoding of UTF-8, and UTF-8 strings an encoding of ASCII-8BIT. We were passing the SSLSocket directly to REXML::Parsers::SAX2Parser and REXML throws exceptions when the input is not UTF-8.

Our solution, wrap the socket and always return consistently encoded strings:

class SSLSocketUtf8 < OpenSSL::SSL::SSLSocket
  def sysread *args
    super.force_encoding ::Encoding::UTF_8
  end
end


<whitehat101> Hello, I'm investigating some strange behavior with OpenSSL::SSL::SSLSocket and string encodings
<whitehat101> #readline returns UTF-8 encoded strings, until the string actually contains UTF-8, then it claims that the encoding is ASCII-8BIT
<whitehat101> I've been reading through the source, and I'm not sure where to try to patch it
<drbrain> whitehat101: have an example script?
<drbrain> whitehat101: can you reproduce it with #sysread?
<drbrain> if you can, the problem lies in the C code
<drbrain> if you cannot, the problem lies in the OpenSSL::Buffering module
<whitehat101> I don't have a concise example, I'm working with the xmpp4r project
<drbrain> whitehat101: look at sample/openssl/echo_*
<drbrain> you can probably make a simple example out of that
<whitehat101> I found that #sysread always returns 8BIT, but #readline usually gives UTF-8
<whitehat101> Thank you, i'll look at those
<drbrain> whitehat101: then I imagine the problem is that OpenSSL::Buffering#initialize creates a UTF-8 buffer
<drbrain> (@rbuffer)
<drbrain> I bet that # encoding: ASCII-8BIT at the very top of the file will fix it
<whitehat101> in buffering.rb?
<drbrain> in ext/openssl/lib/openssl/buffering.rb
<whitehat101> My feeling is that these functions should be returning UTF-8
<whitehat101> A patch that works for my project:
    class SSLSocketUtf8 < OpenSSL::SSL::SSLSocket
      def sysread *args
        super.force_encoding ::Encoding::UTF_8
      end
    end
<drbrain> hrm
<drbrain> they should be returning the encoding of the SSLSocket
<whitehat101> It doesn't look like SSLSocket has any supportfor encodings
<whitehat101> I tried setting the encoding of the TCPSocket, but it had no effect
<drbrain> since SSLSocket wraps the TCPSocket, I don't know if that has an effect on SSLSocket#sysread
<whitehat101> I'm guessing that SSLSocket has no idea what the encoding is, it just deals with bytes
<whitehat101> We're passing the SSLSocket directly to  REXML::Parsers::SAX2Parser
<whitehat101> and REXML throws exceptions when the input is not UTF-8
<drbrain> possibly, since it isn't an IO subclass and doesn't seem to respond to #set_encoding
<drbrain> setting the encoding on the TCPSocket probably has no effect because SSLSocket needs to read binary data off the TCPSocket
<drbrain> the ultimate solution would be "make SSLSocket support encodings"
<whitehat101> That sounds right to me
<drbrain> a short-term fix would be "make the SSLSocket methods return a consistent encoding, regardless of correctness"
<drbrain> whitehat101: if you file a bug, maybe I'll find the time to fix it for ruby 2.1
<drbrain> you can file one here: http://bugs.ruby-lang.org/projects/ruby-trunk/issues/new
<whitehat101> That would be excellent, thanks
<whitehat101> Should I try to make an example, or just include this conversation?
<drbrain> this conversation is enough



-- 
http://bugs.ruby-lang.org/