Issue #14127 has been updated by nobu (Nobuyoshi Nakada).


laykou (Ladislav Gallay) wrote:
> This file should contain BOM information so that it is properly detected as UTF-16LE file.
> 
> How to generate such file:
> 
> ~~~ruby
> file = CSV.generate(encoding: 'UTF-16LE') do |csv|
>     csv << ['something', '']
> end
> ~~~

csv.rb seems having bugs in ASCII-incompatible encodings support.

> According to `file -I file.csv` this file is recognized as `application/octet-stream; charset=binary` because it is missing the BOM information.
> 
> According to Wikipedia https://en.wikipedia.org/wiki/UTF-16 it should contain "\xFF\xFE" on the beginning of the document so that everyone knows iths UTF-16LE.

`CSV.generate` just builds a CSV string, doesn't create a file.
Writing the result to a file with BOM is an application's responsibility.

```ruby
CSV.open("utf16.csv", "w:UTF-16LE:utf-8") do |csv|
  csv.to_io.write "\uFEFF"
  csv << ['something', '']
end
```

> Here is someone trying to fix this in the similiar way: https://stackoverflow.com/a/22950912/1632815 I did it: manually adding that BOM information.

```ruby
new_html_file = File.open("foo.txt", "w:UTF-16LE")
new_html_file << "\uFEFF" << some_text
```


----------------------------------------
Bug #14127: (CSV) generating UTF-16LE encoded file without BOM
https://bugs.ruby-lang.org/issues/14127#change-67906

* Author: laykou (Ladislav Gallay)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.4.1
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
This file should contain BOM information so that it is properly detected as UTF-16LE file.

How to generate such file:

~~~ruby
file = CSV.generate(encoding: 'UTF-16LE') do |csv|
    csv << ['something', '']
end
~~~

According to `file -I file.csv` this file is recognized as `application/octet-stream; charset=binary` because it is missing the BOM information.

According to Wikipedia https://en.wikipedia.org/wiki/UTF-16 it should contain "\xFF\xFE" on the beginning of the document so that everyone knows iths UTF-16LE.

Here is someone trying to fix this in the similiar way: https://stackoverflow.com/a/22950912/1632815 I did it: manually adding that BOM information.

~~~ ruby
## Adds BOM, albeit in a somewhat hacky way.
new_html_file = File.open(foo.txt, "w:UTF-8")
new_html_file << "\xFF\xFE".force_encoding('utf-16le') + some_text.force_encoding('utf-8').encode('utf-16le')
~~~




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>