Hi,
In short, the comment was wrong. O_BINARY only disables newline
conversion, does not change encoding of the output. I recommend "b"
file mode, which is smarter. Whether we should update PStore is
controversial. The discussion should move to ruby-core.
matz.
In message "Re: File::BINARY does not behave as advertised. How do I help to fix this?"
on Tue, 13 Sep 2011 01:47:11 +0900, Cameron Pope <camerooni / gmail.com> writes:
|
|I noticed some anamolous behavior opening files with the file mode
|flags. If the default internal encoding is set, when using the file
|mode flags to open a file, the file's external encoding is set to
|something other than ASCII-8BIT, which can cause binary file
|operations (such as Marshal.dump) to blow up.
|
|Please forgive the long message, and let me know if it would be more
|appropriate to open some issues, but since I've never posted an Ruby
|issue before, I wanted to make sure I was not being naive and that I
|understand what is really going on.
|
|Here is a simple example of what I mean:
|
| #/usr/bin/env ruby
| Encoding.default_internal = 'UTF-8'
|
| File.open('test',File::CREAT | File::RDWR | File::BINARY) do |f|
| # This should be ASCII-8BIT, right? At least according to io.c, line 10792
| puts "Integer Flags Encoding: #{f.external_encoding.to_s}"
| end
|
| File.open('test2','w+b') do |f|
| # This actually is ASCII-8BIT
| puts "String Mode Encoding: #{f.external_encoding.to_s}"
| end
|
|And running it:
|
| file-binary-test cpope$ ruby simple_file_test.rb
| Integer Flags Encoding: UTF-8
| String Mode Encoding: ASCII-8BIT
|
|I don't think that is the intended behavior. If I look at IO.c in the
|latest Ruby code snapshot:
|
| --- io.c (last night's snapshot)
| 10792 #ifndef O_BINARY
| 10793 # define O_BINARY 0
| 10794 #endif
| 10795 /* disable line code conversion and make ASCII-8BIT */
| 10796 rb_file_const("BINARY", INT2FIX(O_BINARY));
|
|As one can see above, first of all, File::BINARY will be zero in every
|case that I can suss out in the Ruby source code - there is nowhere in
|the 1.9.x codebase I can see that defines O_BINARY to be anything but
|zero, and as was empirically demonstrated above, opening a file with
|this constant will not set the encoding to ASCII-8BIT. What is really
|bad about this is when using the integer flags to open a file, there
|is not a good way to check if a developer intended for a it to be
|opened as a binary file. There is, of course, a way to manually
|specify the encoding for a file opened with the integer flags, which
|would be the right thing to do in the case above.
|
|So my first question is: How do we address this deficiency? I can't
|think of a better way than to document the 'catch' with using the
|integer flags in this case. I've noticed that many of the File
|constants aren't documented, so I'm happy to give it a shot if that's
|the best approach.
|
|But this brings us to another issue. There are some places in the Ruby
|standard library that depend on File::BINARY actually opening a file
|suitable for writing Binary data. For example, in PStore:
|
|At the top of lib/pstore.rb
| 96 class PStore
| 97 binmode = defined?(File::BINARY) ? File::BINARY : 0
| 98 RDWR_ACCESS = File::RDWR | File::CREAT | binmode
| 99 RD_ACCESS = File::RDONLY | binmode
| 100 WR_ACCESS = File::WRONLY | File::CREAT | File::TRUNC | binmode
|
|These flags are passed to the bottlenecks that open the data file for
|reading and writing. Because it is using the integer constants to
|define how the file is opened, it's not hard to make PStore blow up in
|the course or normal operation. To conserve space, I've put some
|sample code in this gist: https://gist.github.com/1211614
|
|So my second thought is that this is an issue with the PStore library,
|and that it would be appropriate to modify the file bottlenecks so
|they explicitly specify ASCII-8BIT as the file encoding. Is there any
|reason that I'm off target and I should not log that as an issue with
|a test and a patch?
|
|Apologies in advance if I am using the wrong forum or am totally
|off-base with my questions.
|
|Thank you for your time,
|Cameron