On 21 Jul 2011, at 13:26, Brian Candler wrote:

> Iain Barnett wrote in post #1012004:
>>    File.readlines(logfile, :encoding =3D> "UTF-8" )
>>=20
>> Now spits out the error:
>>=20
>>  ArgumentError - invalid byte sequence in UTF-8
>=20
> Are you sure it's that particular line which splits out the error?
>=20
> There are no hard-and-fast rules, because of the whole incoherent =
design=20
> of ruby 1.9, but in many cases you can *read* a string which has =
invalid=20
> encodings, but you get an error later on when you try to do things =
like=20
> regexp matches on it.
>=20
> irb(main):002:0> File.open("zzz1","wb") { |f| f.write("\xdd\xdd") }
> =3D> 2
> irb(main):003:0> File.readlines("zzz1")
> =3D> ["\xDD\xDD"]
> irb(main):004:0> File.readlines("zzz1", :encoding=3D>"UTF-8")
> =3D> ["\xDD\xDD"]
> irb(main):005:0> File.readlines("zzz1", :encoding=3D>"UTF-8")[0] =3D~ =
/./
> ArgumentError: invalid byte sequence in UTF-8
>  from (irb):5
>  from /usr/local/bin/irb192:12:in `<main>'
> irb(main):006:0>
>=20
> You can of course set :encoding=3D>"BINARY" (or "ASCII-8BIT") when you=20=

> read the file. Or you could open the file in binary mode ("rb"), which =
I=20
> don't think File.readlines supports directly, but File.open does. The=20=

> two are not exactly the same; binary mode also prevents CR/CRLF=20
> translations on non-Unix platforms.
>=20
> I'd suggest that BINARY mode is the way to go for you. If your =
objective=20
> is to read in some log lines, chomp them, and write them out again,=20
> whilst allowing arbitrary byte sequences, this will Just Work [TM], =
just=20
> like it would in ruby 1.8.
>=20
> However, regexp matches will be against individual bytes of the =
string,=20
> rather than entire UTF-8 characters.
>=20
> It's strange how in ruby 1.9, str[x] works just fine with invalid=20
> encodings, but str=3D~/./ does not. But that's only one of many =
strange=20
> things about ruby 1.9.
>=20

Thanks. I am running some regex on the lines later, so that is where the =
script is actually choking. I'll just have to put up with this I =
suppose.

Again, many thanks.

Regards,
Iain=