Issue #14863 has been updated by xsimov (Xavier Sim).


I think it is not consistent because normally what I have seen the most is arrays of strings or containing strings being joined into one string.

In a case like that it so happens that all of the strings are in UTF-8 (because it is my locale encoding and the default in all the machines I have been working with) and so I get different encodings when the array is filled or when the array is empty.

The fact that an array of only numbers returns US-ASCII encoding is a result of the #join method being well written because it takes the encoding of the inner elements and a result of the #to_s on the numbers also unexpected behaviour. But that I think does not fall into the scope of this change.

~~~
irb(main):001:0> 1.to_s.encoding
=> #<Encoding:US-ASCII>
irb(main):002:0> "a".encoding
=> #<Encoding:UTF-8>
~~~

So when the empty array suddenly returns a different (hardcoded) encoding it can break user's programs with an `Unidentified byte character sequence in UTF-8`, and it feels inconsistent for users working with an environment where the default encoding is any other than US-ASCII.

----------------------------------------
Bug #14863: Array#join with empty array returns empty string always in US-ASCII encoding
https://bugs.ruby-lang.org/issues/14863#change-72714

* Author: xsimov (Xavier Sim)
* Status: Feedback
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.4.2
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Calling
~~~
irb(main):001:0> [].join.encoding
=> #<Encoding:US-ASCII>
~~~ 
returns an empty string and that empty string is always in US-ASCII encoding.

The expected result is that the returned empty string would be in UTF-8 since it seems to be the default for Ruby strings since 2.0.





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>