[Johan Söòensen <johans / gmail.com>, 2004-12-17 16.42 CET] > # this in an utf-8 encoded erb template (a rails "view" in my case) > <% text = "Eftersom jag jobbar som kontruktöò/ingenjöò pdagarna och > hackar cocoa" -%> > <%= text[0..47] %> > <br /> > <%= text[0..48] %> > <br /> > # notice the 'o' in ingenjor instead of 'ö§ > <% othertext = "Eftersom jag jobbar som kontruktöò/ingenjor pdagarna > och hackar cocoa" -%> > <%= othertext[0..47] %> > > #produces this (the last character on the first line will display as > a "funny character" in browsers) > > Eftersom jag jobbar som kontruktöò/ingenjöò p? > Eftersom jag jobbar som kontruktöò/ingenjöò p> Eftersom jag jobbar som kontruktöò/ingenjor p> > > Is this a possible bug in Ruby (1.8.1) or could it be something with > Rails that gets in the way, I can reproduce this across two servers > and in webrick. It is a Ruby feature :). Indices in strings are bytes, not chars. For the moment, you must develop your own indexing routines for UTF-8 strings (notice that String#[/regex/] works, because regexes are UTF-8 aware). Here is something you can start from: module UTF8Str def [] (*params) if params.all? { |p| Integer===p } || params.size==1 && Range===params[0] res = self.unpack("U*").[](*params) res = [res] unless Array===res return res.pack("U*") end super end end a="áéióúü¢ a.extend UTF8Str puts a[0], a[1], a[2], a[3], a[4], a[1,2], a[1..2], a[-1] Good luck. --