I get some untrusted input from some of our partners that should be in
utf-8 (or generally plain 7-bit ascii), but isn't always (and in fact in
some cases appears to be a multiple incompatible string encodings
concatenated together, truncated strangely then joined, or perhaps just
noise).  I'd like to convert the string into something that's valid
utf-8 so I can work with it, ideally keeping as much of the valid
encoding parts of the string as possible.  I tried encode! but ran into
weirdness where it would return a string that claims to valid but isn't
(which seems like a bug).

# test strings
1.9.3p0> str1 = "ceramic
rollers1\x82/֧ۧ-ҧ/5059F\xAA\xB3\xF3\xC7\xF9)-\xB0\xA1\xB3\xAA\xB4\xF8.xls&tempFileName=1310611982277\xC1\xA6110
\xC7\xDD\xC0\xED\xB4\xDC(\xC1\xF6\xBF\xAA\xB3\xF3\xC7\xF9)-\xB0\xA1\xB3\xAA\xB4\xF8.xls"
1.9.3p0> str2 = "hydroxide+caustic ͳ\xE7\xBE"

# encode!
1.9.3p0> a = str1.dup
1.9.3p0> a.valid_encoding?
 => false
1.9.3p0> a.encode!(Encoding::UTF_8, Encoding::UTF_8, :invalid=>:replace,
:undef=>:replace, :replace=>'')
 => "ceramic
rollers1\x82/֧ۧ-ҧ/5059F\xAA\xB3\xF3\xC7\xF9)-\xB0\xA1\xB3\xAA\xB4\xF8.xls&tempFileName=1310611982277\xC1\xA6110
\xC7\xDD\xC0\xED\xB4\xDC(\xC1\xF6\xBF\xAA\xB3\xF3\xC7\xF9)-\xB0\xA1\xB3\xAA\xB4\xF8.xls"
1.9.3p0> a.valid_encoding?
 => true
# so far so good
1.9.3p0> a.squeeze(' ')
ArgumentError: invalid byte sequence in UTF-8
  from (irb):10:in `squeeze'
  from (irb):10
  from /home/tgarnett/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
# !!! ruby just claimed the encoding was valid! BUG??
# a.dup.squeeze(' '), "#{a} ".squeeze(' ') both fail as well


Also tried iconv with //IGNORE but it returns
invalid strings on some inputs, and also crashes on some others.  I've
had better luck with unpack/pack, but I was wondering if anyone new a
better way to do this.

# iconv
1.9.3p0> require 'iconv'
1.9.3p0> a = str1.dup
1.9.3p0> a = Iconv.new('UTF-8//IGNORE', 'UTF-8').iconv(a)
 => "ceramic
rollers1/֧ۧ-ҧ/5059F)-.xls&tempFileName=1310611982277110
(\xF6\xBF\xAA\xB3)-.xls"
1.9.3p0> a.valid_encoding?
 => false
# no luck here either...
1.9.3p0> b = str2.dup
1.9.3p0> b = Iconv.new('UTF-8//IGNORE', 'UTF-8').iconv(b)
Iconv::InvalidCharacter: "\xE7\xBE"
  from (irb):22:in `iconv'
  from (irb):22
  from /home/tgarnett/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
# ok, can crash too...


# unpack, pack
1.9.3p0> a = str2.dup
1.9.3p0> a = a.unpack('C*').pack('U*')
 => "hydroxide+caustic \u0094\u0094"
1.9.3p0> a.valid_encoding?
 => true
1.9.3p0> a.squeeze(' ')
 => "hydroxide+caustic \u0094\u0094"
# some success, also works for str1