On 23/06/06, darren kirby <bulliver / badcomputer.org> wrote:
> This seems to be working quite nicely, after playing around for a bit. A few
> of my test files were throwing "Iconv::InvalidCharacter" errors on some
> strings, but when I change the "(s+' ')" to "(s)" it works fine. Then, of
> course, the strings that originally worked start throwing the error. So, I do
> this:

Sorry, I translated that code from somewhere else, but forgot that
UTF-16 needs an even number of bytes. The fact that it worked as
advertised was serendipity rather than good judgement! The trouble
with Iconv's //IGNORE flag is that it doesn't ignore trailing errors;
you can get around this by adding a valid codepoint at the end, and
removing it after conversion. Adding a valid byte <128 gets around
this for UTF-8 input, but only worked for your example as it had an
odd number of input bytes. For UTF-16 (LE or BE) without surrogates,
this will work:

t = ic.iconv(s[0,s.length/2*2])

although a more general solution that should also handle surrogates is this:

t = ic.iconv(s[0,s.length/2*2]+"\000\000")[0..-2]

Finally, your input string has a trailing null; a regexp-based
solution is probably the most reliable way to remove this:

t.sub!(/\x00$/, '')

> I wonder though, the docs lead me to believe the iconv library is UNIX only.
> Is this true? I really need a cross-platform solution, but don't have a win32
> box to try on...

It's definitely possible to use iconv on Windows, but it wasn't in the
one-click installer until 1.8.4, I believe.

Paul.