Sorry I haven't responded earlier but it seems I'm not notified by 
email.

The thing is I do not control the input, it is the browscap file found 
here and is in ISO-8859-1
http://browsers.garykeith.com/downloads.asp

\xdf (decimal 223) is a valid ISO-8859-1 code point 
(http://en.wikipedia.org/wiki/ISO/IEC_8859-1)
 it appears as '?' because my terminal is UTF-8 but the bytes are there:

$ cat test.rb
a = "Der gro\xdfe BilderSauger"
a.each_byte { |b| puts b }

$ ruby test.rb
68
101
114
32
103
114
111
223 <- Here I am
101
32
66
105
108
100
101
114
83
97
117
103
101
114

You can also see that the length is 22, not 25.

Also if I
puts a.encode('UTF-8', 'ISO-8859-1')
I see the proper character in my terminal

But when read from a file:

$ cat test.rb
File.open('test.in', 'r:ISO-8859-1').each_line do |l|
  puts l
  puts '***'
  puts l.length
  puts '***'
  l.each_byte {|b| puts b}
end

$ ruby test.rb
Der gro\xdfe BilderSauger
***
25
***
68
101
114
32
103
114
111
92    <- Here
120  <- we
100  <- are
102  <- as 4 ASCII chars '\xdf'
101
32
66
105
108
100
101
114
83
97
117
103
101
114

I also tried to put UTF-8 codepoints and read as UTF-8 without luck. It 
seems there is no escape sequence when reading from a stream, which I 
can understand.

What I can't figure out is how to interpret these escape sequences when 
reading them from a file.

--Gilles

-- 
Posted via http://www.ruby-forum.com/.