Austin,

On Tuesday, September 28, 2004, at 08:15  PM, Austin Ziegler wrote:

> On Wed, 29 Sep 2004 08:14:42 +0900, Patrick May <patrick / hexane.org> 
> wrote:
>> A tarpit would be easier to implement than a captcha.  In the usemod
>> settings, you use NetAddr::IP to check if the env's Remote Addr is
>> within a known spammer domain.  If it is a spammer, set the pages
>> database to a copy.  Nightly / weekly / whatever, dump the latest 
>> pages
>> directory on top of the tarpit.
>>
>> There goes one of my points for my presentation :-)
>>
>> The main resource in fighting spammers is time.  You want to waste
>> their time, let them think that things are working.
>
> I'm approaching it, again, from a slightly different perspective. My
> goal is to make the page seem as if it were entirely a read-only
> website to robots, and 403 if they are known bad crawlers. I don't yet
> have IP banning, but I have robot exclusion.

Read-only to robots makes sense as a way of preventing accidental 
problems.  I used to have a delete link on the wiki.  All my pages kept 
getting deleted.  I guessed that it was a robot gone amuck [1] .  I 
also like the bit about recognizing bad crawlers.  No harvesting for 
old fashioned spam is a good thing.

The thing about banning is that it is easy for the vandal to tell that 
they have been detected.  I tried using Apache Deny directives to 
manage abuse, but sometimes that just encourages the vandal to switch 
computers.  Plus the cost of a false positive is denial of service.  
After one particularly annoying episode, I realized that the vandal was 
trying to waste my time.  So I setup the tarpit system to waste his, 
and haven't lost sleep since.

I still do alot of cleanup on my wikis, and I still use Deny 
directives.  Nothing replaces an active administrator.  The tarpit just 
gave me another lever to help me manage the problem.

Cheers,

Patrick

1. I didn't labor to much over it, I just deleted the Delete link.