Issue #9712 has been updated by Yui NARUSE.

Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: DONTNEED, 2.1:DONTNEED

Thomas Thomassen wrote:
> Usaku NAKAMURA wrote:
> > check Dir.entries('Foo', encoding: 'utf-8')
> 
> Ah, well that worked. I'd been referring to the Ruby 2.0.0 docs where this argument is missing:
> http://www.ruby-doc.org/core-2.0/Dir.html#method-c-entries
> 
> But why is this needed?
> On my machine it returns the strings by default in Windows-1252 - which is the same as File.find('filesystem'). I guess it returns it based on that?

yes.

> But for Windows this is really awkward. Windows-1252 is the compatibilitycodepage - but the file system itself is perfectly capable of handling Unicode characters.
> 
> I see Ruby explicitly calls the W versions of the Windows file functions instead of declaring the UNICODE flag - this makes all system calls treat Ruby with compatibility handling.
> 
> The Windows file system isn't actually Windows-1252 encoded - or any other encoding ruby currently reports. It's all Unicode - I can use any character I like, so why isn't Ruby just returning result from file functions as Unicode?

* Ruby side: many part of Ruby implementation already uses W version API but some part are not. therefore for consistency it is still ANSI based
* User side: there's many legacy code which imply ANSI strings

Ruby must migrate to Unicode on some day future, but we haven't done yet.

----------------------------------------
Bug #9712: Dir.entries replace Unicode character with questionmarks
https://bugs.ruby-lang.org/issues/9712#change-46156

* Author: Thomas Thomassen
* Status: Assigned
* Priority: Normal
* Assignee: Zachary Scott
* Category: doc
* Target version: current: 2.2.0
* ruby -v: ruby 2.2.0dev (2014-04-07 trunk 45528) [i386-mswin32_100]
* Backport: 2.0.0: DONTNEED, 2.1: DONTNEED
----------------------------------------
My basis when testing this is that I have a computer with English OS - codepage Windows-1252. The tests might yield different result if the Windows codepage is different - so please pay attention to that if you are unable to reproduce.

Given a folder named "Foo" which contains a sub-folder "Ƥ" ("\u3066\u3059\u3068") Dir.entries("Foo") will return:
[".", "..", "???"]

The characters that doesn't fit my filesystem codepage is translated into question marks.

I would have expected the strings returned to be in some Unicode format.



-- 
https://bugs.ruby-lang.org/