Wiki Spam Report
----------------

I thought I would take some time and report on the wiki spam situation
on RubyGarden.  As I hope you have noticed, the wiki has been
remarkably spam free.  This email will tell you what measures we have
taken to get to this point.

But first ...

Some Numbers
------------

Over the past 10 days, we have had:

  93 updates to the wiki page, all (AFAICT) spam free.
     (although I might have missed spotting some).

  46 updates to the wiki tarpit.  Of those, we had ...
     3 innocent updates
     2 questionable updates
     1 update by me
    40 spams

The Mechanism
-------------

Spammers are automatically routed to a wiki tarpit.  The tarpit is an
(almost) exact copy of the real RubyGarden wiki.  Making changes to
the tarpit looks as if you are making changes to the real wiki.  And
since spammers get their pages from the wiki, it looks like (to them)
that they have successfully spammed our site.

However, everyone else never gets to see the spam.

By tricking the spammers into thinking they are successful, they don't
put any additional effort into bypassing our spam detection criteria.
This is important!  When we explicitly denied them access to the wiki,
then went to great lengths to figure out how to get around the
restrictions.  I haven't seen any of that kind of probing with the
tarpit.

Detecting Spammers
------------------

The current spammer detection logic is based on two observations:

(1) Spammers almost never use an IP address that has reverse lookup
enabled.  This effectively means that it appears (to the wiki
software) that your host name looks like a numeric IP address.

(2) Spammers almost never set user preferences on the wiki.

So if both of these conditions are true, we treat the access as a spammer
and send it to the tarpit.

Now this isn't perfect, but that's OK.  We also have a explicit ban
list for spammers who pass one of (1) or (2) above.  And we have an
explicit allow list that overrides the automatic spammer detection.

Innocent Users
--------------

Can innocent users get trapped by the Tapit?  The short answer is yes.
However, we are monitoring the tarpit and will attempt to rescue such
users.

In the past 10 days, there were at least 3 page updates that were from
innocent users.  One guy (bless his heart) even removed some spam from
the tarpit for us.

When I see innocents trapped in the tarpit, I add their IP address to
the allow list and manually update the wiki with their changes (if
they are significant).

Detecting the Tarpit?
---------------------

The tarpit is deliberately designed to look like the original wiki, so
it is sometimes difficult to tell when you are trapped.  Here's some
suggestions.

You are probably in the Tarpit when:

* there are a lot of recent updates made with numeric IP addresses
  rather than host names.

* a lot of the pages have spam.

Although neither of these suggestions are foolproof.  I refresh the tarpit
from the real wiki occasionally (to keep it looking realistic). 
Immediately after a refresh it is /very/ difficult to tell the difference.

If you think you are trapped by the tarpit, send me
(jim / weirichhouse.org) an email with your IP address and I will check
the logs.  If you are trapped, we can add your IP address to the allow
list.

If you are worried about getting caught in the tarpit, just make sure you
have your user preferences set when accessing the tarpit (click on the
preferences link from any wiki page).

Summary
-------

I am pretty happy with the current wiki situation.  In fact, the
tarpit has been so successful, that I am considering lifting the ban
on lower case http.  The ban currently isn't buying us any benefits
and is rather annoying (I'll make it so both upper and lower case
work).

Thanks for your time.

-- 
-- Jim Weirich     jim / weirichhouse.org    http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)