Peter Fitzgibbons wrote: > Second, the error messaging from the scanner tells me that I have a > "Invalid > byte 1 of 1-byte UTF-8 sequence." > That's nice, but I have no way to tell _what_ byte is in violation. Sorry, I can't answer any of your other questions, but as this is the Java end barfing on Ruby (or other) UTF-8 data, the character might be a 0. Java uses a modified UTF where 0 is encoded in 2 bytes (for compatibility with C 0-terminated strings). Otherwise, it may be the bytes 0xFE or 0xFF. These are invalid in UTF-8, but are used sometimes as a byte-order mark. So, I reckon that an "invalid 1-byte UTF-8 sequence" can only be 0xFE, 0xFF or 0x00 (but actually that last one is valid UTF-8). Cheers, Dave