On Thu, Sep 15, 2011 at 11:50 AM, Wayne Brissette <waynefb / earthlink.net> wrote:
> First off, I'm very new to Ruby and I'm trying to wrap my head around a few things, so if this sounds simplistic I apologize in advanceI did do some searching but I'm not sure how to fix this error.
>
> I'm using Mac OS X (Lion) and I've started a script that reads in an xml file (ditamap) and parses the data, so I only end up with a listing of files that map uses.
>
> When I run the script using Ruby 1.8.x, the script works as I expected ito work. However, when I run it using Ruby .9.x, I get the following error:
>
> `gsub': invalid byte sequence in US-ASCII (ArgumentError)
>
> From what I've determined via the web, this has to do with some mis-matchf what the OS is using vs. what Ruby is using. One post I read recommended reading the file as a binary to get around this. However, I'm wondering what the real fix is for this problem, and why is it happening in 1.9 vs. 1.8.
>
>
>
> For the record, here is how I'm opening my files:
>
> ditamap_file= File.read("v5630097.ditamap")

Your issue is likely a late consequence of reading the file with
improper encoding.  I can provoke the same behavior:

irb(main):001:0> s="a
=> "a"
irb(main):003:0> s.bytes.to_a
=> [97, 195, 159]
irb(main):004:0> File.open("x","w:UTF-8"){|io|io.write s}
=> 3
irb(main):005:0> t = File.open("x","r:UTF-8"){|io|io.read}
=> "a"

Now we are reading with the wrong encoding:

irb(main):008:0> t = File.open("x","r:ASCII"){|io|io.read}
=> "a\xC3\x9F"
irb(main):009:0> t.bytes.to_a
=> [97, 195, 159]
irb(main):010:0> t.gsub(/./){"X"}
ArgumentError: invalid byte sequence in US-ASCII
        from (irb):10:in `gsub'
        from (irb):10
        from /opt/bin/irb19:12:in `<main>'
irb(main):011:0>

The error does not show up during loading but during gsub.  If you
define the target encoding, the error pops up earlier:

irb(main):012:0> t = File.open("x","r:ASCII:UTF-8"){|io|io.read}
Encoding::InvalidByteSequenceError: "\xC3" on US-ASCII
        from (irb):12:in `read'
        from (irb):12:in `block in irb_binding'
        from (irb):12:in `open'
        from (irb):12
        from /opt/bin/irb19:12:in `<main>'


Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/