[Johan S÷˛ensen <johans / gmail.com>, 2004-12-17 16.42 CET]
> # this in an utf-8 encoded erb template (a rails "view" in my case)
> <% text = "Eftersom jag jobbar som kontrukt÷˛/ingenj÷˛ pdagarna och
> hackar cocoa" -%>
> <%= text[0..47] %>
> <br />
> <%= text[0..48] %>
> <br />
> # notice the 'o' in ingenjor instead of '÷ž 
> <% othertext = "Eftersom jag jobbar som kontrukt÷˛/ingenjor pdagarna
> och hackar cocoa" -%>
> <%= othertext[0..47] %>
> 
> #produces  this (the last character on the first line will display as
> a "funny character" in browsers)
> 
>  Eftersom jag jobbar som kontrukt÷˛/ingenj÷˛ p?
>  Eftersom jag jobbar som kontrukt÷˛/ingenj÷˛ p>  Eftersom jag jobbar som kontrukt÷˛/ingenjor p>  
> 
> Is this a possible bug in Ruby (1.8.1) or could it be something with
> Rails that gets in the way, I can reproduce this across two servers
> and in webrick.

It is a Ruby feature :). Indices in strings are bytes, not chars. For the
moment, you must develop your own indexing routines for UTF-8 strings
(notice that String#[/regex/] works, because regexes are UTF-8 aware).

Here is something you can start from:

module UTF8Str
        def [] (*params)
                if params.all? { |p| Integer===p } ||
                   params.size==1 && Range===params[0]
                        res = self.unpack("U*").[](*params)
                        res = [res] unless Array===res
                        return res.pack("U*")
                end
                super
        end
end
   
a="ßÚiˇ˙Řó
a.extend UTF8Str

puts  a[0], a[1], a[2], a[3], a[4], a[1,2], a[1..2], a[-1]


Good luck.

--