Issue #12052 has been updated by jeremyevans0 (Jeremy Evans).


duerst (Martin Drst) wrote in #note-2:
> Sorry to @jeremyevans0, but I have to disagree. This is a bug. We can disagree about how important it is to fix this bug, but it's a bug nevertheless. First, xml: :text works correctly in other encodings even if the source and destination encodings match.
> ```Ruby
> "<q&".force_encoding("shift_JIS").encode("shift_JIS", xml: :text)
> => "&lt;q&amp;"
> ```
> 
> The bug is that we process UTF-16LE as if it consisted of 1-byte ASCII-based code units. I still have to identify exactly where and when that happens.

Ah.  So you are saying that `"<\0>\0".encode("utf-16le", "utf-16le", xml: :text)` needs to have the same result as: 
`"<\0>\0".encode("utf-8", "utf-16le", xml: :text).encode("utf-16le")`. I agree, that makes more sense and this is a bug.

It looks like this issue occurs when using both multibyte source and destination encoding.  If either the source or destination encoding is not multibyte, the issue doesn't occur:

```ruby
# Multibyte source, single-byte destination
"<\0>\0".encode("utf-8", "utf-16le", xml: :text).bytes
=> [38, 108, 116, 59, 38, 103, 116, 59]

# Single-byte source, multibyte destination
"<>".encode("utf-16le", "utf-8", xml: :text).bytes
=> [38, 0, 108, 0, 116, 0, 59, 0, 38, 0, 103, 0, 116, 0, 59, 0]

# Multibyte source, multibyte destination
"<\0>\0".encode("utf-16le", "utf-16le", xml: :text).bytes
=> [38, 108, 116, 59, 0, 38, 103, 116, 59, 0]
``` 

So a possible way to work around the issue until it can be properly fixed would be to detect the case where both source and destination are multibyte,switch the destination to UTF-8, then encode the result of that to the desired destination encoding.

----------------------------------------
Bug #12052: String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings
https://bugs.ruby-lang.org/issues/12052#change-92651

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
* Assignee: akr (Akira Tanaka)
* Backport: 2.0.0: REQUIRED, 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED
----------------------------------------
`String#encode`をASCII非互換エンコーディングから同じエンコーディングへ、`xml:`オプション付きで呼ぶとおかしな結果を返します。
バイナリとして変換してしまっているようです。

```ruby
p "<\0>\0".encode("utf-16le", "utf-16le", xml: :text)
#=> "\u6C26\u3B74\u2600\u7467;"
```



-- 
https://bugs.ruby-lang.org/