On 26-jun-2006, at 15:27, Michal Suchanek wrote:
>
> Why would you read 4096 bytes in the first place?
This is a pattern. If a file has no line endings, but just one (very  
logn) stream of characters - can you really use gets?
>
> If you knew the file is in some weird multibyte encoding you should
> have set it for the stream, and read something meaningful.

Or there should be a facility that preserves you from reading  
incomplete strings. But is it implied that if I set IO.encoding = foo  
the IO objects will prevent me? Will they go out to the provider
of the io and get the missing remaining bytes?
In the case of Unicode the absolute, rigorous minimum is to NEVER  
EVER slice into a codepoint, and it can go anywhere you want in terms  
of complexity (because
slicing between codepoints is also not the way).
>
> If it is "ascii compatible" (ISO-8859-*, cp*, utf-8, .. ) you can  
> just use gets.
>
> Otherwise there is no meaningful string content.
>
> Note that 4096 bytes is always OK for UTF-32 (or similar plain wide
> character encodings),
Of which UTF-32 is the only one that is relevant for Unicode, and if  
you investigated the subject a little
you would know that slicing Unicode strings at codepoint boundaries  
is often NOT enough. That way you can cut a part of
a compound character, a modifier codepoint or an RTL override  
remarkably easily, which will just give you a different character  
altogether (or alter your string
diplay in a particularly nasty way - that is, _reverse_ your string  
display for the remaining output of you program  if you remove an RTL  
override terminator).

> and may at worst get you half of a surrogate
> character for UTF-16. And strings will have to handle incomplete
> characters anyway - they may result from some delays/buffering in
> network IO or such.

This is exactly why the notion of having strings both as byte buffers  
and character vectors seems a little difficult. 90 percent of my use  
cases for Ruby need characters, not bytes
- and I would love to hint it specifically shall that be needed. The  
problem right now is that Ruby does not distinguish these at the moment.


-- 
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl