martinus wrote:
> You should use training material that is similar to the text you want
> to analyze for best results. I don't think it is useful to train .doc
> docments when you want to analyze html files.

Can you clarify this? Do you mean:

1. The text is not pulled from the format but retains some residue from
where it came from (JuliusCaesar.doc will train differently from
JuliusCaesar.html).

2. The material should be of the same general type, coming from the same
type of source; but the actual format does not affect training.

3. Something else?


Hal