On Mon, Sep 12, 2011 at 16:00, Cameron Pope <camerooni / gmail.com> wrote:
> Here is a simple example of the problem that I am seeing:
>
> #/usr/bin/env ruby
> Encoding.default_internal = 'UTF-8'
>
> File.open('test',File::CREAT | File::RDWR | File::BINARY) do |f|
>  # This should be ASCII-8BIT, right? At least according to io.c, line 10792
>  puts "Integer Flags Encoding: #{f.external_encoding.to_s}"
> end

File::BINARY is passed to the open() or creat() calls as O_BINARY (as
opposed to O_TEXT). It means that the operating system will not
perform any character translation. It only applies to Windows, as Unix
and Linux don't perform character translation at OS level and don't
distinguish between binary and text files.

Opening a file in binary mode should make no difference to whatever
Ruby's internal character set choice is, and how Ruby chooses to
translate that (or not) when writing data to the file.

To put it another way: Opening a file in binary mode determines
whether *Windows* performs character translations to data written to,
or read from, the file. It makes no difference to the transformations
*Ruby* performs.

I would expect Ruby to mark the file's encoding according to
default_external, not default_internal. It apparently does so, if you
use integer flags:

Encoding.default_internal = 'UTF-8'
Encoding.default_external = 'ASCII'

File.open('test', File::CREAT | File::RDWR | File::BINARY) do |f|
  puts "Integer Flags Encoding: #{f.external_encoding.to_s}" # => US-ASCII
end

Encoding.default_internal = 'ASCII'
Encoding.default_external = 'UTF-8'

File.open('test2', File::CREAT | File::RDWR | File::BINARY) do |f|
  puts "Integer Flags Encoding: #{f.external_encoding.to_s}" # => UTF-8
end

> File.open('test2','w+b') do |f|
>  # This actually is ASCII-8BIT
>  puts "String Mode Encoding: #{f.external_encoding.to_s}"
> end

...so I think this is the buggy behavior. It looks as if 'w+b' always
results in ASCII-8BIT external encoding, whatever the value of
Encoding.default_external.

> As one can see above, first of all, File::BINARY will be zero in every
> case that I can suss out in the Ruby source code - there is nowhere in
> the 1.9.x codebase I can see that defines O_BINARY to be anything but
> zero

I guess you're not running Ruby on Windows?

O_BINARY is defined by the OS, if the feature exists, in <fcntl.h>.
Hence the ifdefs in the Ruby source code.

(Ruby has its own fcntl library, which I documented a while back, but
that doesn't include O_BINARY because it's not a POSIX thing, just one
of those unfortunate details Windows users have to worry about.)

> So my second thought is that this is an issue with the PStore library,
> and that it would be appropriate to modify the file bottlenecks so
> they explicitly specify ASCII-8BIT as the file encoding.

If PStore wants binary read/write, it ought to be specifying that Ruby
should open the files as binary character encoding; opening them in OS
binary mode is not sufficient to do that (and will be a null op on
Unix).


mathew
-- 
<URL:http://www.pobox.com/~meta/>