* Robert Klemme <shortcutter / googlemail.com> (09:04) schrieb: > If I'd really need it I'd probably do a heuristic based on > distribution of byte values across an initial portion of the file. That only shows how many non-ascii-characters are used. It won't recognise russian script in utf-8 as text, or uuencode as binary. What diff (and thus rcs, cvs, svn ...) cares about is lines. Something is text if it's logically organized in short lines, and eohl cahracters are used only for ending lines. class File def self.binary?(name) cr, len, mlen = false, 0, 0 File.open(name, "rb") {|io| io.read(1024)}.each_byte do |bt| return false if cr and bt != 10 case bt when 13 cr = true when 10 mlen = len if len > mlen len = 0 else len += 1 end end mlen > 1000 end end I chose 1000 as the maximum line length, to fit whole paragraphs in one line. But of course the maximum of the proceeding tool is relevant here. There is the right place to do the check anyway. mfg, simon .... l