-----BEGIN PGP SIGNED MESSAGE-----

In article <2d81dedb0611021036t5e47d35ex55c294e634873b59 / mail.gmail.com>,
Giles Bowkett <gilesb / gmail.com> wrote:
>> As long as you analyze natural language, both seem suited, although with
>> different degrees of complexity under the hood, both have a very simple
>> interface: define a category and train it. Then a guess interface to
>> evaluate candidates.
>
>I'm hoping to develop yet another spam filter. in that sense I can
>only say I'm sort of analyzing natural language. Not all of it is
>natural language, some of it is code. In the Paul Graham thing where
>he came up with this idea, if I remember right, he said that a font
>tag with the color red turned out to be the single most reliable
>indicator of spam. Obviously in HTML e-mail there are going to be
>similar trends. However if the tokenizer is the only problem that may
>be something I can change without too much stress.
>

Long ago, I wrote an interface to the ifile program and I use
that in my spam/email filtering. ifile is abandomware at the
moment. I think I posted it on the ruby mailing list at ome
point, you might try searching for it. 


_ Booker C. Bense 


-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBRUpDBWTWTAjn5N/lAQFx9QP+NqHWWcudTBnJK3u2qofqheu6p0hJ3W2I
L6elwknvioDWRuwWO/rksM2DZXwQ6trTHkpEnh0REEsWGl6n683ckuYBbr/ElVA2
9SfGWM0cXspEVX6Xsx/xFsnpF8mdF6le6SdxSEHr0HGhq+8NY1HFoLSOEKdEIBo6
p2sZwJ6+94Q=
=1IG0
-----END PGP SIGNATURE-----