2009/12/27 Brian Candler <b.candler / pobox.com>

> DJ Jazzy Linefeed wrote:
> > compare.rb:64:in `gsub': broken UTF-8 string (ArgumentError)
>
> Yep. Ruby 1.9 will raise exceptions in all sorts of odd places,
> dependent on both the tagged encoding of the string *and* its content at
> that point in time.
>
> I got as far as recording 200 behaviours of String in ruby 1.9 before I
> gave up:
> http://github.com/candlerb/string19/blob/master/string19.rb
>
> The solution I use is simple: stick to ruby 1.8.x. When that branch
> dies, perhaps reia will be ready. If not I'll move to something else.
>
> IMO, both python 3 and erlang have got the right idea when it comes to
> handling UTF8.
> --
> Posted via http://www.ruby-forum.com/.
>
>
Hi,

I got this kind of problem yesterday too.

While taking some file names with Dir#[], I got some special results.

I was searching for "bad" file names, I mean file names with , or
whatever. When I print the String given in the block directly, no problem.

But then I come with things like:
/Users/benoitdaloze/Library/GlestGame/data/lang/espan><ol.lng

(The ~ is separated from the n and then is not ). The Regexp is acting like
it is 2 different characters. How to handle that easily? I tried to change
the script encoding in MacRoman, but it produced an error of bad encoding
not matching UTF-8.

as output of this script (which is then not able to rename any wrong file,
because tr! seem to not work either on name) :

path = ARGV[0] || "/"

ALLOWED_CHARS = "A-Za-z0-9 %#:$@?!=+~&|'()\\[\\]{}.,\r_-"

Dir["#{File.expand_path(path)}/**/*"].each { |f|
    name = File.basename(f)
    unless name =~ /^[#{ALLOWED_CHARS}]+$/
        puts File.dirname(f) + '/' + name.gsub(/([^#{ALLOWED_CHARS}]+)/,
">\\1<")

        if name.tr!('', 'e') =~ /^[#{ALLOWED_CHARS}]+$/# Here it is not
complete, it is just a test, but it doesn't work even for 'filname'
            File.rename(f, File.dirname(f) + '/' + name)
            puts "\trenamed in #{name}"
            break
        end
    end
}