Simon Krahnke wrote:
> * Robert Klemme <shortcutter / googlemail.com> (09:04) schrieb:
> 
>> If I'd really need it I'd probably do a heuristic based on
>> distribution of byte values across an initial portion of the file.
> 
> That only shows how many non-ascii-characters are used. It won't
> recognise russian script in utf-8 as text, or uuencode as binary.
> 
> What diff (and thus rcs, cvs, svn ...) cares about is lines. Something
> is text if it's logically organized in short lines, and eohl cahracters
> are used only for ending lines.

[snip]

> I chose 1000 as the maximum line length, to fit whole paragraphs in one
> line. But of course the maximum of the proceeding tool is relevant here.
> There is the right place to do the check anyway.

That's why clearcase (on windows) claimed my pure-ascii xml-file was
non-text (and did refuse to check it in). One line exceeded 8000 characters.

This is on my personal list of 'bad practices', but it may be
appropriate to others.

My 0.02EUR

Stefan