Tom Reilly wrote:

> Using Hash.new, I determined that there are about 22,000 words some 
> abbreviations, some correct spellings, some others incorrect..  There
> are on the average of  20 words per message though many of the words
> are adjitives, prepositions, verbs which don't help classifications.

Regarding spelling mistakes: Giving enough overlap between the correct 
and incorrect word that will not be a problem. The Thunderbird Spam 
filter has learned to deal with the on purpose misspellings and abusing 
of spam senders over the course of time. I think it works like this:

Spam Message A: Deve|oped Commercia|ized Price
Spam Message B: Pr1ce Commercia|ized
Spam Message C: Developed Commercialized Pr1ce
Spam Message D: Developed Commercialized Price

It will see that there is quite some overlap between those messages and 
when it classifies one as spam it will also learn new data from that 
message which will make it adapt given enough data.

It is however a good idea to examine a good amount of the results of its 
classifying of the data and to manually correct them if necessary.