Issue #9713 has been updated by Usaku NAKAMURA.


The encoding of the results of Dir.glob are the same encoding with it's parameter.
So, there is no bug about the second case.

But, the first case, the encoding of __FILE__ should be Windows-1252 (filesystem encoding)
or UTF-8 (script's encoding), I think.
It may be a bug.

----------------------------------------
Bug #9713: __FILE__ return unexpected encoding - breaks Dir.glob
https://bugs.ruby-lang.org/issues/9713#change-46104

* Author: Thomas Thomassen
* Status: Open
* Priority: Normal
* Assignee: cruby-windows
* Category: platform/windows
* Target version: current: 2.2.0
* ruby -v: ruby 2.2.0dev (2014-04-07 trunk 45528) [i386-mswin32_100]	
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
**C:/てすと/FILE.rb:**

~~~
# encoding: UTF-8
puts "Encoding.find 'filesystem': #{Encoding.find('filesystem').inspect}"
puts "Encoding.find 'locale': #{Encoding.find('locale').inspect}"
puts "Encoding.default internal: #{Encoding.default_internal.inspect}"
puts "Encoding.default external: #{Encoding.default_external.inspect}"
puts "Encoding.locale_charmap: #{Encoding.locale_charmap.inspect}"
puts "__FILE__: #{__FILE__.encoding.inspect}"
puts "'foobar': #{'foobar'.encoding.inspect}"
~~~

**C:/FILE.rb:**
~~~
# encoding: UTF-8
puts "Encoding.find 'filesystem': #{Encoding.find('filesystem').inspect}"
puts "Encoding.find 'locale': #{Encoding.find('locale').inspect}"
puts "Encoding.default internal: #{Encoding.default_internal.inspect}"
puts "Encoding.default external: #{Encoding.default_external.inspect}"
puts "Encoding.locale_charmap: #{Encoding.locale_charmap.inspect}"
puts "__FILE__: #{__FILE__.encoding.inspect}"
puts "'foobar': #{'foobar'.encoding.inspect}"

puts ""
puts "Loading C:/てすと/FILE.rb ..."
require "C:/てすと/FILE.rb"
~~~

**Results:**

![](media-20140407.png)

~~~
c:\ruby-220\usr\bin>ruby "C:\FILE.rb"
Encoding.find 'filesystem': #<Encoding:Windows-1252>
Encoding.find 'locale': #<Encoding:IBM437>
Encoding.default internal: nil
Encoding.default external: #<Encoding:IBM437>
Encoding.locale_charmap: "CP437"
__FILE__: #<Encoding:IBM437>
'foobar': #<Encoding:UTF-8>

Loading C:/???/FILE.rb ...
Encoding.find 'filesystem': #<Encoding:Windows-1252>
Encoding.find 'locale': #<Encoding:IBM437>
Encoding.default internal: nil
Encoding.default external: #<Encoding:IBM437>
Encoding.locale_charmap: "CP437"
__FILE__: #<Encoding:UTF-8>
'foobar': #<Encoding:UTF-8>

c:\ruby-220\usr\bin>
~~~

Now, lets see how this affects Dir.glob:

Test scenario - a folder structure like this:
~~~
C:/test/
C:/test/foo/
C:/test/てすと/
~~~

**C:/FILE.rb**

~~~
# encoding: UTF-8
puts "Encoding.find 'filesystem': #{Encoding.find('filesystem').inspect}"
puts "Encoding.find 'locale': #{Encoding.find('locale').inspect}"
puts "Encoding.default internal: #{Encoding.default_internal.inspect}"
puts "Encoding.default external: #{Encoding.default_external.inspect}"
puts "Encoding.locale_charmap: #{Encoding.locale_charmap.inspect}"
puts "__FILE__: #{__FILE__.encoding.inspect}"
puts "'foobar': #{'foobar'.encoding.inspect}"

puts ""
pattern = File.join(File.dirname(__FILE__), "test", "*")
puts "pattern.encoding: #{pattern.encoding.inspect}"
result = Dir.glob(pattern)
p result
p result.map { |file| file.encoding }

puts ""
puts "force encoding:"
pattern.force_encoding("UTF-8")
result = Dir.glob(pattern)
p result
p result.map { |file| file.encoding }
~~~

**Result:**

~~~
c:\ruby-220\usr\bin>ruby "C:\FILE.rb"
Encoding.find 'filesystem': #<Encoding:Windows-1252>
Encoding.find 'locale': #<Encoding:IBM437>
Encoding.default internal: nil
Encoding.default external: #<Encoding:IBM437>
Encoding.locale_charmap: "CP437"
__FILE__: #<Encoding:IBM437>
'foobar': #<Encoding:UTF-8>

pattern.encoding: #<Encoding:IBM437>
["C:/test/foo", "C:/test/???"]
[#<Encoding:IBM437>, #<Encoding:IBM437>]

force encoding:
["C:/test/foo", "C:/test/\u3066\u3059\u3068"]
[#<Encoding:UTF-8>, #<Encoding:UTF-8>]

c:\ruby-220\usr\bin>
~~~

Observe how when Dir.glob is fed a string based on __FILE__ it will return strings in the same encoding, even though the string should include Unicodecharacters. The Unicode characters are replaced by question marks. (ActualASCII bytes for question mark: 63)
Just by forcing the input string to UTF-8 will make Dir.glob return the expected strings with correct Unicode characters.

I'm unsure of where the bug lies, but in terms of what I expected I would not have expected __FILE__ to return different encoding depending on the executing file containing Unicode characters. All files have been marked as UTF-8 in the file header.

---Files--------------------------------
media-20140407.png (83.1 KB)


-- 
https://bugs.ruby-lang.org/