On 26-jun-2006, at 10:07, Daniel DeLorme wrote:

> Dmitrii Dimandt wrote:
>> Substrings? Finding occurence of a string in a nother string?
>
> Those operations are precisely what regexes are best at.
>
>> shouldn't str[0..3] work on characters (for a string with encoding
>> set)? Maybe I want to do something like str[0] = Unicode::upcase 
>> (str[0])? :)
>
> What about
>   str.sub!(/^./){ |c| Unicode::upcase(c) }
> That hardly seems more cryptic to me.
It does seem unnatural and hints that you are working with an  
encoding-incapable language, because
people who are lucky to be in ASCII will be able to do

str[0] = str[0].upcase

but people who are not will have to invent silly workarounds.

>
> It's not that I don't understand the attraction; it's just that I  
> think when handling char-strings it's best to change your mental  
> model to something further away from char/byte arrays.
>
> BTW, if str[0..3] returns the first 4 characters, then how do I get  
> the first 4 bytes?

str.bytes[0..3] seems OK to me. That is: for Strings the character- 
based routines are the base ones, and byte routines are secondary.  
Not the "chars" accessor I had to bolt on
right now. The problem is that you have to PROTECT an ignorant  
programmer from things like normalization and character unity and  
NEVER allow him to cut into a character
of a multibyte string UNLESS he especially mentions that he wants it  
that way.

-- 
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl