Issue #8255 has been updated by arton (Akio Tajima).


OK, I've fixed my test code. It had some bugs and change the 2nd arg of File#open to 'rb:UTF-16LE'.

Invoking String#rstrip is OK, but can't encode to another encoding from UTF-16LE.

First, I tried to encode utf-16le line to utf-8 using line.rstrip.encode('utf-8') but it failed.

<"This is not a love song."> expected but was
<"\uFFFE\u5400\u6800\u6900\u7300\u2000\u6900\u7300\u2000\u6E00\u6F00\u7400\u2000
\u6100\u2000\u6C00\u6F00\u7600\u6500\u2000\u7300\u6F00\u6E00\u6700\u2E00\u0A00\u
5400\u6800\u6900\u7300\u2000\u6900\u7300\u2000\u6E00\u6F00\u7400\u2000\u6100\u20
00\u6C00\u6F00\u7600\u6500\u2000\u7300\u6F00\u6E00\u6700\u2E00\u0A00">.

Then I tried to encode the line to CP932 with the code " line.rstrip.encode('cp932') "
The result was an exception.

Encoding::UndefinedConversionError: U+FFFE to Windows-31J in conversion from UTF-16LE to UTF-8 to Windows-31J.

Then I've tried to remove BOM from original line with code below:
        p line[0] #=> "\uFFFE"
        if line[0] == "\uFFFE"  # => false, why ? (maybe BOM is nothing here character, but ...)
          line = line[1..-1]
        end

But nothing changes because the condition line[0] == "\uFFFE" was evaluated to false because if I put else clause, the clause run.

Is there any way to encode UTF-16LE to utf-8 or CP932 ?
----------------------------------------
Bug #8255: File#each_line omits last byte (==\0) if encoding is utf-16
https://bugs.ruby-lang.org/issues/8255#change-38453

Author: arton (Akio Tajima)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: current: 2.1.0
ruby -v:  ruby 2.1.0dev (2013-04-11) [i386-mswin32_100]


If File#each_line was given utf-16 encoded file with 'rb:utf-16', each line lacks the last one byte.
For example if the line is "a\0\r\0\n\0" in binary, the read line contains "a\0\r\0\r".

See the attchement.
This issue is appear both current 2.1.0 and 2.0.0.


-- 
http://bugs.ruby-lang.org/