I've implemented a simple XHTML validation class based on REXML and
YAML, and it works like a charm except for invalid XML: when there is
something like a loose unescaped '<' character, it just raises
ParseException with no obvious reference to the guilty character. Is
it possible to get more useful info out of REXML, or should I use some
other XML validator?

Sanitize class (54 lines total):

http://savannah.nongnu.org/cgi-bin/viewcvs/samizdat/samizdat/samizdat/sanitize.rb?rev=1.99

YAML file with allowed XHTML tags and attributes:

http://savannah.nongnu.org/cgi-bin/viewcvs/samizdat/samizdat/xhtml.yaml?rev=1.99