Markus wrote:

>>At a later stage it will start babbling using reasonable phrases as 
>>chunks, and transitionsing from phrase to phrase based on some kind of 
>>statistical relationship.  Still later...well, I don't yet know just how 
>>far this can go.  I'm hoping it will become interesting.  I intend to 
>>feed it a bunch of books from Gutenberg as background, but I'm starting 
>>with Alice30.txt (Alice in Wonderland).
>>    
>>
>If it turns out you're using this to try to get past spam filters I
>think a lot of us will be very disappointed.
>
>-- Markus
>
Well, since I plan to eventually release full sources...that may well 
happen if it's successful.  Then again, it could probably also be used 
to sort spam from ham. 

I basically think of this as a part of an AI project, and as such will 
have multiple uses. 
E.g., one test of ham is that most of what it contains consists of 
reasonable phrases.  If it doesn't have reasonable phrases, it's 
probably something else.  Which, unfortunately, includes programs.  So 
you'd need a separate recognizer to decide that it was or wasn't a 
program.  And possibly others.

But the spam/ham problem is an arms race.  I suspect that a final answer 
is impossible this side of individually tailored filters.  Bayes is 
already a start at this, but it's just a start.  To be really effective 
the filter will need to dip into the semantic level.  (So far I'm pretty 
much staying at the syntactic level, because it's more tractable...but 
semantics will need to be added.)