On 26-jun-2006, at 10:07, Daniel DeLorme wrote: > Dmitrii Dimandt wrote: >> Substrings? Finding occurence of a string in a nother string? > > Those operations are precisely what regexes are best at. > >> shouldn't str[0..3] work on characters (for a string with encoding >> set)? Maybe I want to do something like str[0] = Unicode::upcase >> (str[0])? :) > > What about > str.sub!(/^./){ |c| Unicode::upcase(c) } > That hardly seems more cryptic to me. It does seem unnatural and hints that you are working with an encoding-incapable language, because people who are lucky to be in ASCII will be able to do str[0] = str[0].upcase but people who are not will have to invent silly workarounds. > > It's not that I don't understand the attraction; it's just that I > think when handling char-strings it's best to change your mental > model to something further away from char/byte arrays. > > BTW, if str[0..3] returns the first 4 characters, then how do I get > the first 4 bytes? str.bytes[0..3] seems OK to me. That is: for Strings the character- based routines are the base ones, and byte routines are secondary. Not the "chars" accessor I had to bolt on right now. The problem is that you have to PROTECT an ignorant programmer from things like normalization and character unity and NEVER allow him to cut into a character of a multibyte string UNLESS he especially mentions that he wants it that way. -- Julian 'Julik' Tarkhanov please send all personal mail to me at julik.nl