Dreamcat Four wrote:
> 
> It seems that the correct thing to do when reading a file through an
> IO object is set the encoding to Encoding::BINARY and ignore the
> ascii tags. Unless the ASCII tag says its a text file, then set the
> Encoding to ASCII. Thats pretty easy really.

But one doesn't want to ignore the tags when they denote the
structure of the file.

Here's an excerpt from a simple WAV file parser I had written
several years ago while using ruby 1.8.4, which still works on
1.9.2 thanks to ASCII-8BIT.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

class WAVParseError < StandardError; end
class NotRIFFFormat < WAVParseError; end
class NotWAVEFormat < WAVParseError; end

def read_chunk_header(file)
  chunk_name = file.read(4)
  len = file.read(4).unpack("V").first
  [chunk_name, len]
end

def parse_wav(file)
  riff, riff_len = read_chunk_header(file)
  raise NotRIFFFormat unless riff == 'RIFF'
  riff_end = file.tell + riff_len
  wave = file.read(4)
  raise NotWAVEFormat unless wave == 'WAVE'
  while file.tell < riff_end
    chunk_name, len = read_chunk_header(file)
    fpos = file.tell
    yield file, chunk_name, len if block_given?
    file.seek(fpos + len)
  end
end

if $0 == __FILE__
  # by way of example, just print the chunk names and lengths
  ARGV.each do |fname|
    File.open(fname, "rb") do |io_|
      puts fname
      begin
        parse_wav(io_) do |io, chunk_name, len|
          puts "%4s %08x" % [chunk_name, len]
        end
      rescue StandardError => ex
        warn "error: #{ex.message}"
      end
    end
  end
end


~~~~~~~~~~~~~~~~~~~~~~~~~


$ ruby -v parse_wav.rb m:/snd/startrek/trezap.wav
ruby 1.8.4 (2005-12-24) [i386-mswin32]
m:/snd/startrek/trezap.wav
fmt  00000010
data 0000b9f1
LIST 00000058
cue  0000001c
LIST 00000038


$ ruby19 -v parse_wav.rb m:/snd/startrek/trezap.wav
ruby 1.9.2dev (2010-07-06) [i386-mswin32_100]
m:/snd/startrek/trezap.wav
fmt  00000010
data 0000b9f1
LIST 00000058
cue  0000001c
LIST 00000038


The above just lists the chunks; but an extended version of
the parser decided whether to parse certain chunks in more
detail with logic like the following:

      case chunk_name
        when 'fmt ' then handle_fmt_chunk(io, len)
        when 'data' then handle_data_chunk(io, len)
      end

So we definitely don't wish to ignore the chunk names.


> What prompted me to report this:
> 
> Translating data from a Ruby hash object and simple Ruby types into
> a Plist representation. To give users a standard and appropriate
> way to differentiate between their Ruby strings which are either
> textual (ascii or unicode), and their persistent binary data.

Could you use Encoding::ASCII instead of ASCII-8BIT in this case,
to differentiate between ascii vs. binary?



Regards,

Bill