Sylvester T Cat wrote in post #967790:
> I'm reading a CSV file that has some non US-ASCII characters.  I want
> to parse each value in each row and strip out any leading/lagging
> potential whitespace.
> However, when I come across some unusual characters, I get invalid
> byte sequence in UTF-8

I guess it's not genuinely UTF-8.

If you think it *is* sort of broken UTF-8 which includes FF characters 
for some reason, then you could force encoding to binary, remove the FF 
characters, then force back to UTF-8.

More likely I'd have thought it was a single-byte encoding (like 
ISO-8859-1 perhaps). But in any case, if you're just doing CSV parsing, 
you can quite legitimately treat UTF-8 as binary - since all you need to 
do is recognise commas and double quotes, and the rest just gets passed 
through.

More info at
https://github.com/candlerb/string19/blob/master/string19.rb

Or just use ruby 1.8.

-- 
Posted via http://www.ruby-forum.com/.