You should use training material that is similar to the text you want to analyze for best results. I don't think it is useful to train .doc docments when you want to analyze html files. martinus