Issue #15908 has been updated by duerst (Martin D=FCrst).

Status changed from Open to Closed

Depending on usage, distinction of UTF-8 (with/without BOM), UTF-16LE witho=
ut BOM, UTF-16BE with or without BOM, and so on may also be necessary. Also=
, for Japanese, traditionally distinction between EUC-JP, Shift_JIS, and IS=
O-2022-JP can additionally be necessary.

For more complex cases, heuristics are needed. On the other hand, applicati=
ons may not want to (or not be allowed to, as e.g. for the bootstrap phase =
of an XML parser) allow more than a well defined subset.

This kind of processing is therefore better left to applications.

I'm closing this issue to not leave it dangling, but please feel free to re=
open if you disagree.

----------------------------------------
Bug #15908: Detecting BOM with non-UTF encoding
https://bugs.ruby-lang.org/issues/15908#change-81251

* Author: nobu (Nobuyoshi Nakada)
* Status: Closed
* Priority: Normal
* Assignee: =

* Target version: =

* ruby -v: =

* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Currently, "bom|" encoding prefix to `File.open` is ignored if the encoding=
 name is not a UTF.
But one usage of BOM is to tell if the stream is a UTF or not, and especial=
ly common on Windows, e.g. UTF-16LE or OEMCP.
So I think this restriction should be removed.

---Files--------------------------------
0001-Enable-BOM-detection-with-non-UTF-encodings.patch (4.27 KB)


-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>