Hi,
On 1 March 2011 09:48, "Martin J. Dst" <duerst / it.aoyama.ac.jp> wrote:
> I'm really surprised that the encoding is kept for an arbitrary byteslice.
>
> assert_equal("\x81\x82".force_encoding(Encoding::UTF_8),
> \u3042".byteslice(1..2))
>
> really just doesn't make sense to me. In UTF-8, the string "\x81\x82" is
> just garbage, and will hit some exception or other problem sooner or later.
> The only reasonable result is to mark this as BINARY aka ASCII8BIT. I don't
> think anybody would expect something else.
>
> Regards, artin.

I agree, this is weird.
I think the only use cases for #byteslice are for treating a part of a
binary String.
So, it should be expected you have to "force_encoding BINARY" it.

About thread-safety issues, one can always #dup it and then call
#force_encoding.
This is not very efficient, but that's the only way to keep the
original String well tagged and not mutated at Ruby level.

I imagine this method is useful for low level String manipulation
libraries (because it does not dup the whole String), but that is very
specific use case. Maybe these libraries should implement themselves
these low-level methods for faster manipulation of String and
Encoding.

Suraj, can you show some use cases you have for #byteslice ?

Just some morning thoughts,
B.D.

P.S.: Sorry to not have reacted earlier, I did not expect it to have a
patch and be merged so fast.