--Apple-Mail9F6123C-CC42-4D81-8543-1A631F1E8DC3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset-ascii


On Oct 22, 2011, at 9:16 PM, Joshua Ballanco wrote:

> On Saturday, October 22, 2011 at 12:43 PM, Jon wrote:
>>  
>>> What Ruby needs (IMHO), is the equivalent of Obj-C's NSData class. That is, something which can hold a contiguous span of raw bytes without encoding, but with the ability to access ranges and iterate over the data like a String. I regret that I did not recall this desire of mine for the original Ruby 2.0 feature list (I originally encountered the need for this when writing the ControlTower server for MacRuby; which, consequently, does make use of NSData). I would, however, like to propose such a class for Ruby 2.0.
>>  
>>  
>> What's your view regarding both the `bytes` (immutable) and `bytearray` (mutable) abstractions from
>>  
>>   http://docs.python.org/py3k/library/functions.html#bytearray
> 
> 
> Yes, this sounds like a very similar idea (NSData is immutable and has an NSMutableData counterpart). I think the intro for the NSData documentation captures the motivation perfectly:
> 
>> NSData and its mutable subclass NSMutableData provide data objects, object-oriented wrappers for byte buffers. Data objects let simple allocated buffers (that is, data with no embedded pointers) take on the behavior of Foundation objects.
> 
> Basically, since the Array class in Ruby is designed to hold objects, there is an annoying amount of overhead required to use Ruby arrays to hold simple bytes (e.g. you have to manually decompose bytes on each append operation). On the other hand, since Ruby does its best to always do the right thing with encodings for String objects, it can get annoying to try and use Ruby strings to hold bytes (you never know when your BINARY string might be coerced into UTF-8). 

I like and agree with this concept but I wonder if we are talking about two (subtly) different things.

There really is a problem if I take a string encoded with UTF-8 and try to concatenate it with a string encoded with 8859-1 (or one of the more exotic character sets).  What I have never understood (and the Ruby people have tried to educate me) is why, when I say "utf-8-string" + "8859-1-string", Ruby can't just convert the latter to the encoding of the first, do the concatenation and be done with it.  So, there is a second problem.

And there is a third problem (which is probably a set of problems).  In my application, all the data actually starts off as various EBCDIC code pages. (http://bit.ly/rtTO8F).  Using ICU (http://site.icu-project.org/), I convert these to UTF-8 strings.  I store these in a PostgreSQL database (9.0.4) that is set up with UTF-8 encoding.  But STILL, frequently, something creates strings that are not UTF-8 strings.  As previously stated, I've set all my files to UTF-8 coding as well as set -KU but there are still ways for things to get botched.  And my whole point here is that what Ruby has ended up doing is making simple libraries damn near impossible to write if you really really really really want to do things properly.  Any library that concatenate any strings is open to mistakes.

pedz


--Apple-Mail9F6123C-CC42-4D81-8543-1A631F1E8DC3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset-ascii

<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Oct 22, 2011, at 9:16 PM, Joshua Ballanco wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
                <div><span class="Apple-style-span" style="color: rgb(160, 160, 168); ">On Saturday, October 22, 2011 at 12:43 PM, Jon wrote:</span></div>
                <blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px;">
                    <span><div><div><div><div>&nbsp;</div><blockquote type="cite"><div><div><div>
What Ruby needs (IMHO), is the equivalent of Obj-C's NSData class. That is, something which can hold a contiguous span of raw bytes without encoding, but with the ability to access ranges and iterate over the data like a String. I regret that I did not recall this desire of mine for the original Ruby 2.0 feature list (I originally encountered the need for this when writing the ControlTower server for MacRuby; which, consequently, does make use of NSData). I would, however, like to propose such a class for Ruby 2.0.</div>
</div><div><div>
                </div></div></div></blockquote><div>&nbsp;</div><div>&nbsp;</div><div>What's your&nbsp;view regarding both&nbsp;the `bytes` (immutable) and `bytearray` (mutable) abstractions from</div><div>&nbsp;</div><div>&nbsp; http://docs.python.org/py3k/library/functions.html#bytearray</div>
</div></div></div></span></blockquote><div><br>
                </div><div><br></div><div>Yes, this sounds like a very similar idea (NSData is immutable and has an NSMutableData counterpart). I think the intro for the NSData documentation captures the motivation perfectly:</div><div><br></div><div><code style="font-size: 13px; font-family: Courier, Consolas, monospace; color: rgb(102, 102, 102); font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); "></code><blockquote type="cite"><code style="font-size: 13px; font-family: Courier, Consolas, monospace; color: rgb(102, 102, 102); font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); ">NSData</code><span style="color: rgb(0, 0, 0); font-family: 'Lucida Grande', Geneva, Helvetica, Arial, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; ">&nbsp;and its mutable subclass&nbsp;</span><code style="font-size: 13px; font-family: Courier, Consolas, monospace; color: rgb(102, 102, 102); font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); ">NSMutableData</code><span style="color: rgb(0, 0, 0); font-family: 'Lucida Grande', Geneva, Helvetica, Arial, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; ">&nbsp;provide data objects, object-oriented wrappers for byte buffers. Data objects let simple allocated buffers (that is, data with no embedded pointers) take on the behavior of Foundation objects.</span></blockquote><div><br></div><div>Basically, since the Array class in Ruby is designed to hold objects, there is an annoying amount of overhead required to use Ruby arrays to hold simple bytes (e.g. you have to manually decompose bytes on each append operation). On the other hand, since Ruby does its best to always do the right thing with encodings for String objects, it can get annoying to try and use Ruby strings to hold bytes (you never know when your BINARY string might be coerced into UTF-8).&nbsp;</div><span style="color: rgb(0, 0, 0); font-family: 'Lucida Grande', Geneva, Helvetica, Arial, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); display: inline !important; float: none; "></span></div>
            </blockquote></div><br><div>I like and agree with this concept but I wonder if we are talking about two (subtly) different things.</div><div><br></div><div>There really is a problem if I take a string encoded with UTF-8 and try to concatenate it with a string encoded with 8859-1 (or one of the more exotic character sets). &nbsp;What I have never understood (and the Ruby people have tried to educate me) is why, when I say "utf-8-string" + "8859-1-string", Ruby can't just convert the latter to the encoding of the first, do the concatenation and be done with it. &nbsp;So, there is a second problem.</div><div><br></div><div>And there is a third problem (which is probably a set of problems). &nbsp;In my application, all the data actually starts off as various EBCDIC code pages. (http://bit.ly/rtTO8F). &nbsp;Using ICU (http://site.icu-project.org/), I convert these to UTF-8 strings. &nbsp;I store these in a PostgreSQL database (9.0.4) that is set up with UTF-8 encoding. &nbsp;But STILL, frequently, something creates strings that are not UTF-8 strings. &nbsp;As previously stated, I've set all my files to UTF-8 coding as well as set -KU but there are still ways for things to get botched. &nbsp;And my whole point here is that what Ruby has ended up doing is making simple libraries damn near impossible to write if you really really really really want to do things properly. &nbsp;Any library that concatenate any strings is open to mistakes.</div><div><br></div><div>pedz</div><div><br></div></body></html>-Apple-Mail9F6123C-CC42-4D81-8543-1A631F1E8DC3--