On 2009-09-19, James Masters <james.d.masters / gmail.com> wrote: > Fortunately, I'm working with a small team of individuals who will be > authoring the files so I do have some control on the type of text that > I'm looking for. So I might try [:print], \n, \t, and maybe \r (just > in case) and then fall back on the NULL idea as a Plan B. How many files are you dealing with? Hmm. Some source files (scripts, say) will be executable, so you can't assumme executables are binaries. But... You might want to experiment with testing a few likely heuristics and maybe making a chart. Say, make a list of: TEST: .jpg x-bit NUL 128-255 FILE: foo.jpg X - X X foo.sh - X - - ... and then look to see whether you can make some simple rules, like "everything with .jpg or .gif is definitely a binary." If you can get a couple of simple rules that deal with 90% of so of the files, then you can look at the remainder as a separate case and work from there. Don't feel compelled to make a single perfect test when three easy tests that handle 70% of the cases might give you a remaining pool for which it's much easier to write a good test. -s -- Copyright 2009, all wrongs reversed. Peter Seebach / usenet-nospam / seebs.net http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!