Iain Barnett wrote in post #1012004:
>     File.readlines(logfile, :encoding => "UTF-8" )
>
> Now spits out the error:
>
>   ArgumentError - invalid byte sequence in UTF-8

Are you sure it's that particular line which splits out the error?

There are no hard-and-fast rules, because of the whole incoherent design 
of ruby 1.9, but in many cases you can *read* a string which has invalid 
encodings, but you get an error later on when you try to do things like 
regexp matches on it.

irb(main):002:0> File.open("zzz1","wb") { |f| f.write("\xdd\xdd") }
=> 2
irb(main):003:0> File.readlines("zzz1")
=> ["\xDD\xDD"]
irb(main):004:0> File.readlines("zzz1", :encoding=>"UTF-8")
=> ["\xDD\xDD"]
irb(main):005:0> File.readlines("zzz1", :encoding=>"UTF-8")[0] =~ /./
ArgumentError: invalid byte sequence in UTF-8
  from (irb):5
  from /usr/local/bin/irb192:12:in `<main>'
irb(main):006:0>

You can of course set :encoding=>"BINARY" (or "ASCII-8BIT") when you 
read the file. Or you could open the file in binary mode ("rb"), which I 
don't think File.readlines supports directly, but File.open does. The 
two are not exactly the same; binary mode also prevents CR/CRLF 
translations on non-Unix platforms.

I'd suggest that BINARY mode is the way to go for you. If your objective 
is to read in some log lines, chomp them, and write them out again, 
whilst allowing arbitrary byte sequences, this will Just Work [TM], just 
like it would in ruby 1.8.

However, regexp matches will be against individual bytes of the string, 
rather than entire UTF-8 characters.

It's strange how in ruby 1.9, str[x] works just fine with invalid 
encodings, but str=~/./ does not. But that's only one of many strange 
things about ruby 1.9.

-- 
Posted via http://www.ruby-forum.com/.