Markus wrote: >>At a later stage it will start babbling using reasonable phrases as >>chunks, and transitionsing from phrase to phrase based on some kind of >>statistical relationship. Still later...well, I don't yet know just how >>far this can go. I'm hoping it will become interesting. I intend to >>feed it a bunch of books from Gutenberg as background, but I'm starting >>with Alice30.txt (Alice in Wonderland). >> >> >If it turns out you're using this to try to get past spam filters I >think a lot of us will be very disappointed. > >-- Markus > Well, since I plan to eventually release full sources...that may well happen if it's successful. Then again, it could probably also be used to sort spam from ham. I basically think of this as a part of an AI project, and as such will have multiple uses. E.g., one test of ham is that most of what it contains consists of reasonable phrases. If it doesn't have reasonable phrases, it's probably something else. Which, unfortunately, includes programs. So you'd need a separate recognizer to decide that it was or wasn't a program. And possibly others. But the spam/ham problem is an arms race. I suspect that a final answer is impossible this side of individually tailored filters. Bayes is already a start at this, but it's just a start. To be really effective the filter will need to dip into the semantic level. (So far I'm pretty much staying at the syntactic level, because it's more tractable...but semantics will need to be added.)