--4f9d713e_5d5babb3_103
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Saturday, April 28, 2012 at 8:52 AM, KOSAKI Motohiro wrote:
> On Fri, Apr 27, 2012 at 8:53 PM, MartinBosslet (Martin Bosslet)
> <Martin.Bosslet / googlemail.com (mailto:Martin.Bosslet / googlemail.com)> wrote:
> >  
> > Issue #6361 has been updated by MartinBosslet (Martin Bosslet).
> >  
> >  
> > nobu (Nobuyoshi Nakada) wrote:
> > > Then what kind of methods should Blob have?
> > >  
> > > And does it need to be built-in?
> >  
> > A real advantage of having it built-in could be
> > that this gives us the chance to fix #5741 at
> > the same time. I could imagine that we have two
> > kinds of "byte array" classes - one, mutable,
> > that shares COW semantics and all the other
> > optimizations with String, but with no notion of
> > encoding and a yet-to-be-defined interface.
> >  
> > And then a second class, which is basically the
> > immutable version of the first one. By sharing
> > only a reference we could ensure that the content
> > would not be proliferated and we could securely
> > erase its contents after use.
> >  
>  
>  
> I don't dislike a bult-in idea. But you haven't show a detailed spec
> and I don't think I clearly understand your idea. Can you spend a
> few time for writing a spec? (probably rough a few line explanation
> is enough)
>  
>  
>  


If I may intrude for a momentI think the advantage to having a built in Data/Blob library would be that it could be used in all places where a data class is more appropriate than a string. For example, the Socket library currently returns Strings for data read in from a socket. Ihink a Data class is more appropriate here since the socket itself doesot contain encoding information (i.e. either an arbitrary default encoding needs to be set, a heuristic can be used to guess the encoding, or the encoding is set by a previously agreed up convention; but you cannot ask a socket for its encoding).

As for a spec, I think it should be kept relatively simple. The one interesting optimization from NSData that might be useful is the option of copying bytes on instantiation. Copying is the default, but it is also possible to create a Data object that merely points at the storage of another live object and allows byte-wise manipulation. This is particularly interesting for the case of strings, since I would guess that String and Data would have identical storage layout, allowing one to optimize the case ofreating a Data from a String with no copying.

A quick attempt at a spec:

-----
Data.new #=> New, dynamically resizable container to store some bytesata.new('Test') #=> Can be created from any object that responds to #bytes with an enumerator
Data.new('Hello', copy_bytes: false) #=> Creates the Data from the String by merely pointing to the same storage

Data.open('./foo/test.txt') #=> Create a Data object from a File
Data.open('./bar/test.txt', copy_bytes: false) #=> Same as open above, but manipulates IO#pos for access
Data.write('./baz/test.txt') #=> Writes the bytes to disk.

d = Data.new(a_string)
d[2] #=> Returns the third byte, same as a_string.bytes.to_a[2]
d[2] = 42 #=> Same as a_string.setbyte(2, 42)
d.each #=> Equivalent to a_string.each_byte
d.length #=> Number of bytes currently being stored
d.slice(2, 4) #=> Similar to String#slice
d.slice(2, 4, copy_bytes: false) #=> New data object from slice shares storage with the original
d << other_data #=> Appends bytes from other_data
d.to_s #=> Returns a string using the default internal encoding
d.string_with_encoding('UTF-16') #=> Returns a string using thencoding passed
-----

I know it seems like this class is just wrapping String and always defaulting to byte-wise operations, but it's more fundamental than that. Because there is no encoding on the bytes, there will never be an encoding error when working with them. This could be extremely useful for applicationshat combine bytes from multiple sources (e.g. Socket data + a file on disk + immediate strings in code) that could potentially have different encodings. By operating on bytes, you can defer the encoding checks until later, if at all.

- Josh
--4f9d713e_5d5babb3_103
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline


                <div><span style="color: rgb(160, 160, 168); ">On Saturday, April 28, 2012 at 8:52 AM, KOSAKI Motohiro wrote:</span></div><blockquote type="cite"><div>
                    <span><div><div><div>On Fri, Apr 27, 2012 at 8:53 PM, MartinBosslet (Martin Bosslet)</div><div>&lt;Martin.Bosslet / googlemail.com&gt; wrote:</div><blockquote type="cite"><div><div><br></div><div>Issue #6361 has been updated by MartinBosslet (Martin Bosslet).</div><div><br></div><div><br></div><div>nobu (Nobuyoshi Nakada) wrote:</div><blockquote type="cite"><div><div>Then what kind of methods should Blob have?</div><div><br></div><div>And does it need to be built-in?</div></div></blockquote><div><br></div><div>A real advantage of having it built-in could be</div><div>that this gives us the chance to fix #5741 at</div><div>the same time. I could imagine that we have two</div><div>kinds of "byte array" classes - one, mutable,</div><div>that shares COW semantics and all the other</div><div>optimizations with String, but with no notion of</div><div>encoding and a yet-to-be-defined interface.</div><div><br></div><div>And then a second class, which is basically the</div><div>immutable version of the first one. By sharing</div><div>only a reference we could ensure that the content</div><div>would not be proliferated and we could securely</div><div>erase its contents after use.</div></div></blockquote><div><br></div><div>I don't dislike a bult-in idea. But you haven't show a detailed spec</div><div>and I don't think I clearly understand yourdea. Can you spend a</div><div>few time for writing a spec? (probablyough a few line explanation</div><div>is enough)</div></div></div></span>
                  
                  
                  
                  
                </div></blockquote><div>
                    <br>
                </div><div>If I may intrude for a momentI thinkhe advantage to having a&nbsp;built in&nbsp;Data/Blob library would be that it could be used in all places where a data class is more appropriate than a string. For example, the Socket library currently returns Strings for data read in from a socket. I think a Data class is more appropriate here since the socket itself does not contain encoding information (i.e. either an arbitrary default encoding needs to be set, a heuristic cane used to guess the encoding, or the encoding is set by a previously agreed up convention; but you cannot ask a socket for its encoding).</div><div><br></div><div>As for a spec, I think it should be kept relatively simple. The one interesting optimization from NSData that might be useful is the option of copying bytes on instantiation. Copying is the default, but it is also possible to create a Data object that merely points at the storage of another live object and allows byte-wise manipulation. This isarticularly interesting for the case of strings, since I would guess that String and Data would have identical storage layout, allowing one to optimize the case of creating a Data from a String with no copying.</div><div><br></div><div>A quick attempt at a spec:</div><div><br></div><div>-----</div><div>Data.new #=&gt; New, dynamically resizable container totore some bytes</div><div>Data.new('Test') #=&gt; Can be created from any object that responds to #bytes with an enumerator</div><div>Data.new('Hello', copy_bytes: false) #=&gt; Creates the Data from the String by merely pointing to the same storage</div><div><br></div><div>Data.open('./foo/test.txt') #=&gt; Create a Data object from a File</div><div>Data.open('./bar/test.txt', copy_bytes: false) #=&gt; Same as open above, but manipulates IO#pos for access</div><div>Data.write('./baz/test.txt') #=&gt; Writes the bytes to disk.</div><div><br></div><div>d = Data.new(a_string)</div><div>d[2] #=&gt; Returns the third byte, same as a_string.bytes.to_a[2]</div><div>d[2] =2 #=&gt; Same as a_string.setbyte(2, 42)</div><div>d.each #=&gt; Equivalent to a_string.each_byte</div><div>d.length #=&gt; Number of bytes currently being stored</div><div>d.slice(2, 4) #=&gt; Similar to String#slice</div><div>d.slice(2, 4, copy_bytes: false) #=&gt; New data object from slice shares storage with the original</div><div>d &lt;&lt; other_data #=&gt; Appends bytes from other_data</div><div>d.to_s #=&gt; Returns a string using the default internal encoding</div><div>d.string_with_encoding('UTF-16') #=&gt; Returns a string using the encoding passed</div><div>-----</div><div><br></div><div>I know it seems like this class is just wrapping String and alwaysefaulting to byte-wise operations, but it's more fundamental than that.ecause there is no encoding on the bytes, there will never be an encoding error when working with them. This could be extremely useful for applications that combine bytes from multiple sources (e.g. Socket data + a file on disk + immediate strings in code) that could potentially have different encodings. By operating on bytes, you can defer the encoding checks until later, if at all.</div><div><br></div><div>- Josh</div>
            
--4f9d713e_5d5babb3_103--