Issue #7267 has been reported by kennygrant (Kenny Grant).

----------------------------------------
Bug #7267: Dir.glob on Mac OS X returns unexpected string encodings for unicode file names
https://bugs.ruby-lang.org/issues/7267

Author: kennygrant (Kenny Grant)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 2.0.0
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]


Tested on Ruby 1.9.3-p194 and ruby-2.0.0-preview1 on Mac OS X 10. 7.5

When calling file system methods with Ruby on Mac OS X, it is not possible to manipulate the resulting file name as a normal UTF-8 string, even though it reports the encoding as UTF-8. It seems to be a UTF-8-MAC string, even when the default encoding is set to UTF-8. This leads to confusion as the string can be manipulated normally except for any unicode characters, which seem to be decomposed. So a regexp using utf-8 characters won't work on the string, unless it is first converted from UTF-8-MAC. I'd expect the string encoding to be UTF-8, or at least to report that it is not a normal UTF-8 string if it has to be UTF-8-MAC for some reason. 

Example, run with a file called Testé.txt in the same folder:

def transform_string s
   puts "Testing string #{s}"
   puts s.gsub(/é/,'TEST')
end

Dir.glob("./*.txt").each do |f|  
  puts "Inline string works as expected" 
   s = "./Testé.txt" 
   puts transform_string s

   puts "File name from Dir.glob does not" 
   puts transform_string f
   
   puts "Encoded file name works as expected, though it is reported as UTF-8, not UTF-8-MAC" 
   f.encode!('UTF-8','UTF-8-MAC')
   puts transform_string f
end


-- 
http://bugs.ruby-lang.org/