Issue #13292 has been updated by Yui NARUSE.

Backport changed from 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN to 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: DONE

ruby_2_4 r57935 merged revision(s) 57816,57817.

----------------------------------------
Bug #13292: Invalid encodings in UTF-32
https://bugs.ruby-lang.org/issues/13292#change-63497

* Author: Jan Lelis
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: DONE
----------------------------------------
Ruby is very strict about valid UTF-8 encodings, which is great.

Strings that encode surrogates or too large codepoints are not valid.

However, in UTF-32, it is possible to encode such values, and Ruby treats them as valid:

Example 1 (too large value)

```
a = [0, 0, 17, 0].pack("C*").force_encoding("UTF-32LE") #=> "\u{110000}"
a.valid_encoding? # => true
```

Example 2 (surrogate)

```
b = [0, 216, 0, 0].pack("C*").force_encoding("UTF-32LE") # => "\uD800"
b.valid_encoding? #=> true
```

The behaviour should be changed to `String#valid_encoding?` reporting `false`

For reference: http://unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf (page 71)



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>