On 2009-09-19, James Masters <james.d.masters / gmail.com> wrote:
> Fortunately, I'm working with a small team of individuals who will be
> authoring the files so I do have some control on the type of text that
> I'm looking for.  So I might try [:print], \n, \t, and maybe \r (just
> in case) and then fall back on the NULL idea as a Plan B.

How many files are you dealing with?

Hmm.  Some source files (scripts, say) will be executable, so you can't
assumme executables are binaries.  But... You might want to experiment with
testing a few likely heuristics and maybe making a chart.  Say, make a list
of:

TEST:	.jpg	x-bit	NUL	128-255

FILE:
foo.jpg	X	-	X	X
foo.sh	-	X	-	-
...

and then look to see whether you can make some simple rules, like
"everything with .jpg or .gif is definitely a binary."  If you can
get a couple of simple rules that deal with 90% of so of the files,
then you can look at the remainder as a separate case and work from
there.

Don't feel compelled to make a single perfect test when three easy tests
that handle 70% of the cases might give you a remaining pool for which
it's much easier to write a good test.

-s
-- 
Copyright 2009, all wrongs reversed.  Peter Seebach / usenet-nospam / seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!