Issue #14863 has been updated by xsimov (Xavier Sim).


Taking into account
jeremyevans0 (Jeremy Evans) wrote:
> UTF-8 is the default for literal strings, not the default for all strings.  Note that strings will automatically change their encoding from US-ASCII to UTF-8 if a UTF-8 string that uses non-ASCII characters is appended to them.
> 
> ~~~
> $ ruby -e 'p ([].join << "\u1234").encoding'
> #<Encoding:UTF-8>
> ~~~
And that Array#join also takes into consideration the encoding of the strings within that contain non-ASCII characters:
~~~
$ ruby -e 'p (["\u1234"].join).encoding'
#<Encoding:UTF-8>
~~~

maybe it would make sense that since UTF-8 is the default for literal strings it was the default also for the empty string returned from the join of an empty array.

My proposal to return the string in the locale encoding of the running ruby is so that the encoding returned by #join is consistent, since most of the times I see #join used it contains UTF-8 strings.

Thanks for your feedback! ;)

----------------------------------------
Bug #14863: Array#join with empty array returns empty string always in US-ASCII encoding
https://bugs.ruby-lang.org/issues/14863#change-72586

* Author: xsimov (Xavier Sim)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.4.2
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Calling
~~~
irb(main):001:0> [].join.encoding
=> #<Encoding:US-ASCII>
~~~ 
returns an empty string and that empty string is always in US-ASCII encoding.

The expected result is that the returned empty string would be in UTF-8 since it seems to be the default for Ruby strings since 2.0.





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>