Peter Fitzgibbons wrote:
> Second, the error messaging from the scanner tells me that I have a 
> "Invalid
> byte 1 of 1-byte UTF-8 sequence."
> That's nice, but I have no way to tell _what_ byte is in violation.

Sorry, I can't answer any of your other questions, but as this is the Java 
end barfing on Ruby (or other) UTF-8 data, the character might be a 0. Java 
uses a modified UTF where 0 is encoded in 2 bytes (for compatibility with C 
0-terminated strings).

Otherwise, it may be the bytes 0xFE or 0xFF. These are invalid in UTF-8, but 
are used sometimes as a byte-order mark.

So, I reckon that an "invalid 1-byte UTF-8 sequence" can only be 0xFE, 0xFF 
or 0x00 (but actually that last one is valid UTF-8).

Cheers,
Dave