James Edward Gray II <james / grayproductions.net> wrote: > > utf8rgx=Regexp.new('m/^( > > [\x09\x0A\x0D\x20-\x7E] # ASCII > > | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte > > | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs > > | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte > > | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates > > | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 > > | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 > > | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 > > )*$/x') > > Try changing this to: > > utf8rgx = / ... /x the above regexp doesn't work as expected with ruby, i've compared the output for the same files with perl and ruby, ruby says always "yes it is UTF-8", where perl says NO over an ISO-8859-1 encoded file... (even after wipping out the first line the first ^and the last $) then, for the time being, i'll use the perl script from ruby in a commad line fashion... -- une bñ×ue