Issue #7267 has been updated by kennygrant (Kenny Grant).


> see several lines at the end of enc/utf_8.c.

Thanks for the links, I'll read them to get a better understanding of this, I'm sorry I can't read the other two tickets associated with this as that would probably be enlightening too. I suspect the problem comes because of my somewhat parochial focus on using Ruby in English and other roman languages, where the text editors and other tools tend to generate NFC UTF-8, so I expected to get NFC UTF-8 back. When I type 

s = "détente"

the string is composed, which is what I'd expect, it isn't accidental. If there was some way in future to ask Ruby to use only composed forms when interacting with file systems, it would be really useful for use with western languages on Mac OS. 

> writer.rb's two puts output the same result.

Here is the result I get from writer.rb (with a few more tests)
(ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0], Mac OS X 10.7.5 HFS+ disk, I saw this on 2.0 too)- one result is decomposed, one is composed, depending on the string passed in. 
After writing out a file with an NFC name to disk (resulting in an NFD filename on disk presumably). 
# GLOB on "./*.txt" pattern - result NFD
# [".", "/", "t", "e", "s", "t", "e", "??", ".", "t", "x", "t"]
# GLOB on "./testé.txt" - result NFC
# [".", "/", "t", "e", "s", "t", "é", ".", "t", "x", "t"]
# GLOB on "./testé.*" - fails, no match
# GLOB on "./teste*" - match result NFD
[".", "/", "t", "e", "s", "t", "e", "??", ".", "t", "x", "t"]


> No way. And again, Ruby string is not NFC.

OK thanks, I'll just have to accept that converting to composed to match my internal ruby strings (which are NFC) is necessary when dealing with an HFS file system. 
----------------------------------------
Bug #7267: Dir.glob on Mac OS X returns unexpected string encodings for unicode file names
https://bugs.ruby-lang.org/issues/7267#change-32713

Author: kennygrant (Kenny Grant)
Status: Assigned
Priority: Normal
Assignee: duerst (Martin Dürst)
Category: 
Target version: 2.0.0
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]


Tested on Ruby 1.9.3-p194 and ruby-2.0.0-preview1 on Mac OS X 10. 7.5

When calling file system methods with Ruby on Mac OS X, it is not possible to manipulate the resulting file name as a normal UTF-8 string, even though it reports the encoding as UTF-8. It seems to be a UTF-8-MAC string, even when the default encoding is set to UTF-8. This leads to confusion as the string can be manipulated normally except for any unicode characters, which seem to be decomposed. So a regexp using utf-8 characters won't work on the string, unless it is first converted from UTF-8-MAC. I'd expect the string encoding to be UTF-8, or at least to report that it is not a normal UTF-8 string if it has to be UTF-8-MAC for some reason. 

Example, run with a file called Testé.txt in the same folder:

def transform_string s
   puts "Testing string #{s}"
   puts s.gsub(/é/,'TEST')
end

Dir.glob("./*.txt").each do |f|  
  puts "Inline string works as expected" 
   s = "./Testé.txt" 
   puts transform_string s

   puts "File name from Dir.glob does not" 
   puts transform_string f
   
   puts "Encoded file name works as expected, though it is reported as UTF-8, not UTF-8-MAC" 
   f.encode!('UTF-8','UTF-8-MAC')
   puts transform_string f
end


-- 
http://bugs.ruby-lang.org/