On Jan 6, 2009, at 3:20 AM, Brian Candler wrote: > Kenneth McDonald wrote: >> Any advice most appreciated, > > Use hexdump -C on the file to see what the actual byte sequences > are. If > these are single-byte characters then it's probably ISO-8859-1. If > they > are two bytes then it's probably UTF-8. I have some code that detects valid UTF-8 data here: http://blog.grayproductions.net/articles/the_unicode_character_set_and_encodings#comment_14649 James Edward Gray II