* Stefan Mahlitz <stefan / mahlitz-net.de> (20:46) schrieb: > My question was directed to the 8000 char-paragraph. I even find small > xml-files unreadable Well, there is lot of XML files that I find readable. Including many I or my software wrote. Of course there are perversions like XMI and Microsoft's new formats. > - so I completely agree with you that 8000 chars of xml-data in a > single line is far from being readable by a human. And thus it's binary and not text. > Anyway - xml is meant to be processed by machines. It's meant to be read by an XML parser, which a regular diff isn't. So only special cases are well suited for diff, and other special cases are human readable. > But even this case I would classify as text (I'm changing my earlier > definition slightly) if it does not contain binary data. I would say it's text when interpreted as text/plain it's human readable. Otherwise it's binary. That is, binary = for machines only. > If I understand the original poster correctly he wants to > programmatically detect whether a file is "binary or text". My point was > that he shouldn't restrict his program artifically - but this depends on > context. Yes, in the original post he didn't say, for what purpose. If it's for diffing the line structure is what matters. > Do I summarize correctly that depending on the purpose of the check one > could use a maximum line length - or any other of the posted approaches? The other approaches are good for deciding if the files contains text in latin based scripts. That's only a small subset of text, and they will happily classify base64 as text. > Aka 'use the right tool for the job' + 'There is no single answer to > this question'? Yes. Probably the best approach was using file(1). mfg, simon .... l