--4fa2a7e6_38a5d054_6574
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Thursday, May 3, 2012 at 9:16 AM, "Martin J. Dürst" wrote:
> On 2012/04/30 1:50, Joshua Ballanco wrote:
>  
> > I know it seems like this class is just wrapping String and always defaulting to byte-wise operations, but it's more fundamental than that. Because there is no encoding on the bytes, there will never be an encoding error when working with them. This could be extremely useful for applications that combine bytes from multiple sources (e.g. Socket data + a file on disk + immediate strings in code) that could potentially have different encodings. By operating on bytes, you can defer the encoding checks until later, if at all.
>  
> I'm not saying I'm totally against this, but "extremely useful" could  
> also mean "too useful". There are clearly cases where one needs to put  
> things together at the byte level. But there are also quite some cases  
> that seem to "just work" when using byte-wise operations, at least as  
> long as nothing else but US-ASCII gets used. Things then blow up  
> terribly once some other characters get into the mix.
>  
>  


So, as an addendum to the spec, what about adding a flag when doing a string conversion:

    d.string_with_encoding('UTF-8', reject_if_invalid: true)

So that we could ensure that the return value is always either nil or a string with valid encoding.
  
> Actually, the binary/ASCII-8bit encoding is very close to a Blob. It was
> mostly Akira Tanaka who didn't want to distinguish between "true" binary  
> and ASCII-8bit, because that would have made the use of regular  
> expressions with binary impossible or convoluted.
>  
>  


My problem with String and ASCII-8BIT/BINARY encoding currently is that you *can't* just set a string's encoding to binary and forget about encodings. You will still run into issues working with binary data using Ruby 1.9 strings. I demonstrated the issue here: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/40269 (where I, consequently, also made a plea for a Data/Blob type).
  
> Despite the title of this issue, I didn't see any *bit*wise operations
> (e.g. bitwise and/or/xor/not) proposed. Were you just taking them for  
> granted? What about adding these to String, maybe limiting them to  
> binary/ASCII-8bit?
>  
>  


I was taking bit-wise operations for granted. Ideally, a Data/Blob type would just represent N groupings of 8 1s and/or 0s, with byte-wise access and bit-wise manipulation. i.e. Less structured than an Array, less restrictive than a String. Just data.


--4fa2a7e6_38a5d054_6574
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline


                <div><span style="color: rgb(160, 160, 168); ">On Thursday, May 3, 2012 at 9:16 AM, "Martin J. Dürst" wrote:</span></div>
                <blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px;">
                    <span><div><div><div>On 2012/04/30 1:50, Joshua Ballanco wrote:</div><div><br></div><blockquote type="cite"><div>I know it seems like this class is just wrapping String and always defaulting toyte-wise operations, but it's more fundamental than that. Because theres no encoding on the bytes, there will never be an encoding error when working with them. This could be extremely useful for applications that combine bytes from multiple sources (e.g. Socket data + a file on disk + immediate strings in code) that could potentially have different encodings. By operating on bytes, you can defer the encoding checks until later, if at all.</div></blockquote><div><br></div><div>I'm not saying I'm totally against this, but "extremely useful" could </div><div>also mean "too useful". There are clearly cases where one needs to put </div><div>things together at the byte level. But there are also quite some cases </div><div>that seem to "just work" when using byte-wise operations, ateast as </div><div>long as nothing else but US-ASCII gets used. Things then blow up </div><div>terribly once some other characters get into the mix.</div></div></div></span></blockquote><div><br></div><div>So, as an addendum to the spec, what about adding a flag when doing a string conversion:</div><div><br></div><div>&nbsp; &nbsp; d.string_with_encoding('UTF-8', reject_if_invalid: true)</div><div><br></div><div>So that weould ensure that the return value is always either nil or a string withalid encoding.</div><div>&nbsp;</div><blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px;">Actually, the binary/ASCII-8bit encoding is very close to a Blob. It was<span><div><div><div>mostly Akira Tanaka who didn't want to distinguish between "true" binary </div><div>and ASCII-8bit, because that would have made the use of regular </div><div>expressions with binarympossible or convoluted.</div></div></div></span></blockquote><div><br></div><div>My problem with String and ASCII-8BIT/BINARY encoding currently is that you *can't* just set a string's encoding to binary and forget about encodings. You will still run into issues working with binary data using Ruby 1.9 strings. I demonstrated the issue here:&nbsp;http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/40269&nbsp;(where I, consequently, also made a plea for a Data/Blob type).</div><div>&nbsp;</div><blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px;">Despite the title of this issue, I didn't see any *bit*wise operations<span><div><div><div>(e.g. bitwise and/or/xor/not) proposed. Were you just taking them for </div><div>granted? What about adding these to String, maybe limiting them to </div><div>binary/ASCII-8bit?</div></div></div></span></blockquote><div><br></div><div>I was taking bit-wise operations for granted. Ideally, a Data/Blob type would just represent N groupings of 8 1s and/or 0s, with byte-wise access and bit-wise manipulation. i.e. Less structured than an Array, less restrictive than a String. Just data.</div><div><br>
                </div>
            
--4fa2a7e6_38a5d054_6574--