On 2009-09-19, James Masters <james.d.masters / gmail.com> wrote:
> How about a file that contains any single byte character (0-255) that
> you cannot find a key for on a standard US keyboard (English)?  The
> [:print:] regular expression character set comprises the range of
> characters 32-126, which is what I believe that I need, but I wanted
> to see if there are better ways to accomplish this.

Well, you probably also want tabs and newlines.  :)

I would think that [:print:] might also, in some locales, get you things
like accented letters.  Whether or not you want this is harder to say.

> Basically I'm trying to search for the presence of a header in source
> code files (which may have various extensions or no extensions at
> all).  The source code files are mixed with executable and non-
> executable "binary" files (data files; not something that you can
> read).  I don't want to flag the non-source code files as not having a
> header.  The scope of this problem is small so I don't need to worry
> about any character sets, etc.

I thought that until I found a dozen Makefiles with copyright symbols
embedded in them.  :P

I'd say as a first approximation, just check for NUL bytes.  I'm pretty
sure that the vast majority of binary files will contain at least one,
and the vast majority of text files will contain none.

-s
-- 
Copyright 2009, all wrongs reversed.  Peter Seebach / usenet-nospam / seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!