Hi, I'm having some encoding problems while parsing HTML with Nokogiri
in 1.9.

I was first getting errors on non-breaking space characters (code
160), but managed to resolve this by setting the encoding at the top
of my script file ('# coding: utf-8').

However now I'm trying to do simple string substitution with gsub()
and am getting the error:

  invalid byte sequence in UTF-8

An example of where this is bombing is the word "PROT\xC9G as parsed
by Nokogiri. Removing the encoding setting from my script causes the
original problems, so I seem to be stuck.

Has anybody worked through these issues successfully? Google turns up
a number of discussions without many solutions.