Issue #9713 has been updated by Usaku NAKAMURA.

Status changed from Closed to Assigned

Thomas Thomassen wrote:
> In my test `__FILE__` is returned in the OEM encoding - not filesystem encoding.

So, reopened.


----------------------------------------
Bug #9713: __FILE__ return unexpected encoding - breaks Dir.glob
https://bugs.ruby-lang.org/issues/9713#change-46122

* Author: Thomas Thomassen
* Status: Assigned
* Priority: Normal
* Assignee: cruby-windows
* Category: platform/windows
* Target version: current: 2.2.0
* ruby -v: ruby 2.2.0dev (2014-04-07 trunk 45528) [i386-mswin32_100]	
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
**C:/てすと/FILE.rb:**

~~~
# encoding: UTF-8
puts "Encoding.find 'filesystem': #{Encoding.find('filesystem').inspect}"
puts "Encoding.find 'locale': #{Encoding.find('locale').inspect}"
puts "Encoding.default internal: #{Encoding.default_internal.inspect}"
puts "Encoding.default external: #{Encoding.default_external.inspect}"
puts "Encoding.locale_charmap: #{Encoding.locale_charmap.inspect}"
puts "__FILE__: #{__FILE__.encoding.inspect}"
puts "'foobar': #{'foobar'.encoding.inspect}"
~~~

**C:/FILE.rb:**
~~~
# encoding: UTF-8
puts "Encoding.find 'filesystem': #{Encoding.find('filesystem').inspect}"
puts "Encoding.find 'locale': #{Encoding.find('locale').inspect}"
puts "Encoding.default internal: #{Encoding.default_internal.inspect}"
puts "Encoding.default external: #{Encoding.default_external.inspect}"
puts "Encoding.locale_charmap: #{Encoding.locale_charmap.inspect}"
puts "__FILE__: #{__FILE__.encoding.inspect}"
puts "'foobar': #{'foobar'.encoding.inspect}"

puts ""
puts "Loading C:/てすと/FILE.rb ..."
require "C:/てすと/FILE.rb"
~~~

**Results:**

![](media-20140407.png)

~~~
c:\ruby-220\usr\bin>ruby "C:\FILE.rb"
Encoding.find 'filesystem': #<Encoding:Windows-1252>
Encoding.find 'locale': #<Encoding:IBM437>
Encoding.default internal: nil
Encoding.default external: #<Encoding:IBM437>
Encoding.locale_charmap: "CP437"
__FILE__: #<Encoding:IBM437>
'foobar': #<Encoding:UTF-8>

Loading C:/???/FILE.rb ...
Encoding.find 'filesystem': #<Encoding:Windows-1252>
Encoding.find 'locale': #<Encoding:IBM437>
Encoding.default internal: nil
Encoding.default external: #<Encoding:IBM437>
Encoding.locale_charmap: "CP437"
__FILE__: #<Encoding:UTF-8>
'foobar': #<Encoding:UTF-8>

c:\ruby-220\usr\bin>
~~~

Now, lets see how this affects Dir.glob:

Test scenario - a folder structure like this:
~~~
C:/test/
C:/test/foo/
C:/test/てすと/
~~~

**C:/FILE.rb**

~~~
# encoding: UTF-8
puts "Encoding.find 'filesystem': #{Encoding.find('filesystem').inspect}"
puts "Encoding.find 'locale': #{Encoding.find('locale').inspect}"
puts "Encoding.default internal: #{Encoding.default_internal.inspect}"
puts "Encoding.default external: #{Encoding.default_external.inspect}"
puts "Encoding.locale_charmap: #{Encoding.locale_charmap.inspect}"
puts "__FILE__: #{__FILE__.encoding.inspect}"
puts "'foobar': #{'foobar'.encoding.inspect}"

puts ""
pattern = File.join(File.dirname(__FILE__), "test", "*")
puts "pattern.encoding: #{pattern.encoding.inspect}"
result = Dir.glob(pattern)
p result
p result.map { |file| file.encoding }

puts ""
puts "force encoding:"
pattern.force_encoding("UTF-8")
result = Dir.glob(pattern)
p result
p result.map { |file| file.encoding }
~~~

**Result:**

~~~
c:\ruby-220\usr\bin>ruby "C:\FILE.rb"
Encoding.find 'filesystem': #<Encoding:Windows-1252>
Encoding.find 'locale': #<Encoding:IBM437>
Encoding.default internal: nil
Encoding.default external: #<Encoding:IBM437>
Encoding.locale_charmap: "CP437"
__FILE__: #<Encoding:IBM437>
'foobar': #<Encoding:UTF-8>

pattern.encoding: #<Encoding:IBM437>
["C:/test/foo", "C:/test/???"]
[#<Encoding:IBM437>, #<Encoding:IBM437>]

force encoding:
["C:/test/foo", "C:/test/\u3066\u3059\u3068"]
[#<Encoding:UTF-8>, #<Encoding:UTF-8>]

c:\ruby-220\usr\bin>
~~~

Observe how when Dir.glob is fed a string based on __FILE__ it will return strings in the same encoding, even though the string should include Unicodecharacters. The Unicode characters are replaced by question marks. (ActualASCII bytes for question mark: 63)
Just by forcing the input string to UTF-8 will make Dir.glob return the expected strings with correct Unicode characters.

I'm unsure of where the bug lies, but in terms of what I expected I would not have expected __FILE__ to return different encoding depending on the executing file containing Unicode characters. All files have been marked as UTF-8 in the file header.

---Files--------------------------------
media-20140407.png (83.1 KB)


-- 
https://bugs.ruby-lang.org/