Logan Capaldo wrote:
> The problem with that is that cat isn't really doing anything, and as
> soon as someone saves a multi-byte character to that file, all hell
> is going to break loose. cat is doing something along the lines of
> 
> while(line = getline() ) {
>     for(i =  0; i < length(line); i++) {
>       if isprint(line[i]) {
>          print line[i]
>       }
> }
> 
> which in the case that it just happens to be single-byte characters
> it will skip the nulls. If the source text contains non-english
> characters, etc. those bytes won't just be nulls any more and if it
> is something printable (like the BOM at the beginning of the file for
> instance) it's going to create the wrong output.

Shoot, you're right... this is weird. Using cat straight from the 
command line produces text I can read, but searching through that output 
with my script is broken. How that works, I'll never know.

UTF16 certainly isn't the only encoding I expect to see if I'm going to 
be flexible in which text editors I use. I don't like giving up, but 
meh, it just isn't that big of a deal. Thank you very much for the help, 
though! I didn't even know about iconv before!

-- 
Posted via http://www.ruby-forum.com/.