On 6/15/10, R.. Kumar <sentinel1879 / gmail.com> wrote: > I download the page http://www.ruby-forum.com/forum/4 using wget. Then i > cat the file and pipe to gsub. > > I get: -e:1:in `gsub': invalid byte sequence in US-ASCII (ArgumentError) > > > wget -q -k -O index11.html http://www.ruby-forum.com/forum/4 > > cat index11.html | ruby -pe 'gsub(/href=a\/"/,"href=\"'${base}'")' > > ofile > > (The value of base is http://www.ruby-forum.com/) > > So what must i do so this command can run. It runs fine with another > site. > If i replace ruby with perl -pe 's|....|g' that works fine. > > I actually run this in a loop with various URLS from cron. Handling this kind of thing right means tracking encodings right.... which means you'd have to extract the encoding from the http session and then mark the input as that encoding in your ruby script... and then deal with the inevitable incompatible encoding errors that would crop up. It sounds to me, tho, like in this case what you have a just some hacky little scripts and it would be acceptable for them to be imperfect. So, in that case, I suggest trying to set the encoding for your source file(s) to BINARY. That's a hack, but it ought to be effective. Alternately, you could drop back to the 1.8 interpreter, like Brian suggests, which more or less uses BINARY as the default source encoding.