You should use training material that is similar to the text you want
to analyze for best results. I don't think it is useful to train .doc
docments when you want to analyze html files.

martinus