On Wed, Mar 23, 2011 at 4:53 AM, Stanley Xu <wenhao.xu / gmail.com> wrote: > Yesterday, I sent a mail of let the split ignore the error utf-8 bytes > sequences. And I checked the string I wanted to parse in Java and found out > that the string is encoded in gbk and part of the string is encoded in > utf-8. > > I am wondering if I could find a way to still split the string by split > method, and then I could try to force_encoding part of the string that might > encoded in gbk and resolve the problem. > > I am wondering if there is a way I could do so without the "invalid bytes > sequence" error? A string with a mixed encoding is difficult to handle. I think you have these options 1. Ensure that the string does *not* contain mixed encoding (this would be the first and best choice IMHO). 2. If you can't because you get the data from somewhere else, use encoding BINARY as a diversion: mixed_content.force_encoding Encoding::BINARY chunks = mixed_content.split /\t/ chunks[0].force_encoding Encoding::UTF_8 chunks[1].force_encoding Encoding::GBK or mixed_content.force_encoding Encoding::BINARY a, b = mixed_content.split /\t/ a.force_encoding Encoding::UTF_8 b.force_encoding Encoding::GBK Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/