James Edward Gray II <james / grayproductions.net> wrote:

> > utf8rgx=Regexp.new('m/^(
> >    [\x09\x0A\x0D\x20-\x7E]            # ASCII
> >  | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
> >  |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
> >  | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
> >  |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
> >  |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
> >  | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
> >  |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
> > )*$/x')
> 
> Try changing this to:
> 
> utf8rgx = / ... /x

the above regexp doesn't work as expected with ruby, i've compared the
output for the same files with perl and ruby, ruby says always "yes it
is UTF-8", where perl says NO over an ISO-8859-1 encoded file... (even
after wipping out the first line the first ^and the last $)

then, for the time being, i'll use the perl script from ruby in a commad
line fashion...
-- 
une bue