A long time ago, in a galaxy far, far away, "Hal E. Fulton" <hal9000 / hypermetrics.com> wrote:
> ----- Original Message -----
> From: "michael libby" <x / ichimunki.com>
> To: "ruby-talk ML" <ruby-talk / ruby-lang.org>
> Sent: Wednesday, September 04, 2002 7:30 PM
> Subject: Re: New list: ruby-modules - for module developers...
>
>
>> > You're writing an email client? In Ruby, I assume/hope?
>> > With what GUI?
>>
>> Yes, actually a multi-purpose text editor in Ruby using Tk-- essentially a
>> set of menus and an base Editor class/widget. The base editor is then
>> subclassed into "modes"-- in my first iteration, one mode will be a
>> rudimentary Ruby source code editor,  two modes will be email related-- a
>> "directory" browser and an email message editor-- and a help mode (not
>> entirely sure of how this will look yet, but the guts are already there to
>> do something, um, helpful). So it's more like emacs, and less like
>> Outlook.
>
> Interesting, but not the way I think.
>
>> My main goals for the email components: ability to handle PGP signatures,
>> built-in Bayesian spam filtering, and basic stuff like having an inbox,
>> local mail storage, ability to send, receive, read, reply, forward an
>> email.
>
> Good, very good so far. Um, what's a "Bayesian" spam filter?
>
> I've thought of assigning positive and negative weights to keywords in
> a stoplist and an anti-stoplist. Is it anything like that?
>
> Or is it perhaps an AI type of thing where you tell the app "This is spam"
> and it tries to learn what spam is?
>
> But I should quit guessing.

It is usually described via the term "naive Bayesian filtering," which
is where you basically throw together statistics on the number of
occurrances of words, and assume independence so that you can
estimate joint probabilities simply as P(A and B) = P(A) P(B).

See:
<http://www3.sympatico.ca/cbbrowne/images/ifilter.png>

Basically, what it does is to go a step past AI, head over to
"statistical analysis," and come back and pretend to be AI.

You set up folders with 'good mail' and folders with 'bad mail' and
collect statistics on how often each word occurs in each folder.

You take incoming messages, run them through the formula above, and
see which message folder they best match against.  

In effect, there is no stoplist; every word in every message is taken
into consideration.
-- 
(concatenate 'string "cbbrowne" "@cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/ifilter.html
The  meta-Turing test counts  a thing  as intelligent  if it  seeks to
apply Turing tests to objects of its own creation.