Charles Oliver Nutter wrote:
> Regular expressions for all character work would be a *terribly* slow 
> way to get things done. If you want to get the nth character, should you 
> do a match for n-1 characters and a group to grab the nth? Or would it 
> be better if you could just index into the string and have it do the 

Ok, I'm not very familiar with the internal working of strings in 1.9, 
but it seems to me that for character sets with variable byte size, it 
is logically *impossible* to directly index into the string. Unless 
there's some trick I'm unaware of, you *have* to count from the 
beginning of the string for utf8 strings.

> right thing? How about if you want to iterate over all characters in a 
> string? Should the iterating code have to know about the encoding? 
> Should you use a regex to peel off one character at a time?

That is certainly one possible way of doing things...
   string.scan(/./){ |char| do_someting_with(char) }

> Regex for string access goes a long way, but's just about the heaviest 
> way to do it.

Heavy compared to what? Once compiled, regex are orders of magnitude 
faster than jumping in and out of ruby interpreted code.

> Strings should be aware of their encoding and should be 
> able to provide you access to characters as easily as bytes. That's what 
> 1.9 (and upcoming changes in JRuby) fixes.

Overall I agree that the encoding stuff in 1.9 is very nice. 
Encapsulating the encoding with the string is very OO. Very intuitive. 
No need to think about encoding anymore, now it "just works" for 
encoding-ignorant programmers (at least until the abstraction leaks). It 
allows to shut up one frequent complaint about ruby; a clear political 
victory. Overall it is more robust and less error-prone than the 1.8 way.

But my point was that there *is* a 1.8 way. The thing that riled me up 
and that I was responding to was the claim that 1.8 did not have unicode 
support AT ALL. Unequivocally, it does, and it works pretty well for me. 
IMHO there is a certain minimalist elegance in considering strings as 
encoding-agnostic and using regex to get encoding-specific views. I 
could do str[/./n] and str[/./u]; I can't do that anymore.

1.9 makes encodings easier for the english-speaking masses not used to 
extended characters, but let's remember that ruby *always* had support 
for multibyte character sets; after all it *did* originate from a 
country with two gazillion "characters".

Daniel