Trans wrote: > what's the best way to determine if a file is yaml? In light of the other responses, which show how hard it is to do this in general, what about a pragmatic approach that might work in most of the cases you are interested in? Look at the first N lines. If any line has _any_ non-printing characters, it's not correct YAML and wasn't generated by YAML#dump.[1] If any are longer than M chars or other binary file heuristics apply[2], it's probably not a manually written YAML file. If it passes at least _one_ of these two checks, then check to see if 80% of the (first N) lines match the following: /^\s*(-|\?|[\w\s]*:)\s/ Maybe add some logic to skip blocks of text like this (so they don't count against the 80%): a: | skip me Also, check for > in place of |. And also skip blanks and comments /^\s*(#|$)/. And then finally load it and rescue any ArgumentError. There are probably a lot of corner cases that kill this approach if you cannot tolerate false negatives (i.e., legit yaml that gets rejected by the above). --- [1] The YAML spec, http://yaml.org/spec/current.html, says nonprinting chars are encoded (see 4.1.1. Character Set), and it seems to be true, at least in the dump output: irb(main):023:0> puts({"a"=>"\002"}.to_yaml) --- a: !binary | Ag== However, YAML can load unescaped binary data, as Devin showed: irb(main):025:0> YAML.load "a: \002" => {"a"=>"\002"} [2] For example, http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/52548 -- vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407