On Thu, 27 Nov 2008 01:07:08 +0900, Emiel van de Laar wrote:

> Hello ruby-core,
> 
> Today I was playing around with manipulating strings containing binary
> data, i.e. "\xaa\xab\xac\xad\xae" and using the new String methods
> available in Ruby 1.9.
> 
> The exercise I was trying out was to extract out a range of bytes as
> Fixnums. Kind of like String#bytes but I was only interested in a
> subarray. Like so:
> 
> "\xaa\xab\xac\xad\xae".bytes.to_a[1,3] # => [171, 172, 173]
> 
> This works but operates on the entire data set which I imagine is fairly
> expensive... So I chopped it up before hand like so:

In most cases, it probably isn't, so trying to change things around may 
be premature optimization.

> data = "\xc3\xa9\xc3\xa9" # => "\xC3\xA9\xC3\xA9"
> data.force_encoding("utf-8") # => "" data[0,2].bytes.to_a  # => [195,
> 169, 195, 169]
> 
> Here I get four bytes instead of the first two which I wanted.
> 
> data.bytes.to_a[0,2] # => [195, 169]

If you do need to optimize, try unpack, which treats the string as an 
array of bytes anyway.

data = "\xc3\xa9\xc3\xa9\xc3\xa9\xc3\xa9\xc3\xa9\xc3\xa9\xc3\xa9\xc3\xa9"
data.force_encoding("utf-8")
data.unpack("@5C2")  => [169, 195]

-- 
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/