Code / Data samples?

r_string = 'blah'.encode('UTF-8')
r_regex = /#{r_string}/
text = "wahlahblahblahwahbalablah".encode("UTF-8")
text.gsub!(r_regex, '')

That's a horrible example. Still, if you have ASCII in one place, and
UTF-8 in another, it's conceivable that the matcher may just throw up
its hands. Force the encoding and try again.  If it doesn't work,
please post more information (preferably with a Gist / pastie).  If
that helps, please mention it so that Google can direct other poor
souls to this post.

Scott

On Mon, Oct 11, 2010 at 2:08 PM, Andreas S. <x-ruby-lang / andreas-s.net> wrote:
> I process a lot of text files of which I know the encoding, but that
> might contain a few bytes that are invalid (i.e., make gsub fail with
> "ArgumentError: invalid byte sequence in US-ASCII/UTF8"). What's the
> best way to handle this situation gracefully, by ignoring or removing
> the invalid characters?
> --
> Posted via http://www.ruby-forum.com/.
>
>