* Stefan Mahlitz <stefan / mahlitz-net.de> (20:46) schrieb:

> My question was directed to the 8000 char-paragraph. I even find small
> xml-files unreadable

Well, there is lot of XML files that I find readable. Including many I
or my software wrote.

Of course there are perversions like XMI and Microsoft's new formats.

> - so I completely agree with you that 8000 chars of xml-data in a
> single line is far from being readable by a human.

And thus it's binary and not text.

> Anyway - xml is meant to be processed by machines.

It's meant to be read by an XML parser, which a regular diff isn't. So
only special cases are well suited for diff, and other special cases are
human readable.

> But even this case I would classify as text (I'm changing my earlier
> definition slightly) if it does not contain binary data.

I would say it's text when interpreted as text/plain it's human
readable. Otherwise it's binary. That is, binary = for machines only.

> If I understand the original poster correctly he wants to
> programmatically detect whether a file is "binary or text". My point was
> that he shouldn't restrict his program artifically - but this depends on
> context.

Yes, in the original post he didn't say, for what purpose. If it's for
diffing the line structure is what matters.

> Do I summarize correctly that depending on the purpose of the check one
> could use a maximum line length - or any other of the posted approaches?

The other approaches are good for deciding if the files contains text in
latin based scripts. That's only a small subset of text, and they will
happily classify base64 as text.

> Aka 'use the right tool for the job' + 'There is no single answer to
> this question'?

Yes. Probably the best approach was using file(1).

mfg,                    simon .... l