Austin, On Tuesday, September 28, 2004, at 08:15 PM, Austin Ziegler wrote: > On Wed, 29 Sep 2004 08:14:42 +0900, Patrick May <patrick / hexane.org> > wrote: >> A tarpit would be easier to implement than a captcha. In the usemod >> settings, you use NetAddr::IP to check if the env's Remote Addr is >> within a known spammer domain. If it is a spammer, set the pages >> database to a copy. Nightly / weekly / whatever, dump the latest >> pages >> directory on top of the tarpit. >> >> There goes one of my points for my presentation :-) >> >> The main resource in fighting spammers is time. You want to waste >> their time, let them think that things are working. > > I'm approaching it, again, from a slightly different perspective. My > goal is to make the page seem as if it were entirely a read-only > website to robots, and 403 if they are known bad crawlers. I don't yet > have IP banning, but I have robot exclusion. Read-only to robots makes sense as a way of preventing accidental problems. I used to have a delete link on the wiki. All my pages kept getting deleted. I guessed that it was a robot gone amuck [1] . I also like the bit about recognizing bad crawlers. No harvesting for old fashioned spam is a good thing. The thing about banning is that it is easy for the vandal to tell that they have been detected. I tried using Apache Deny directives to manage abuse, but sometimes that just encourages the vandal to switch computers. Plus the cost of a false positive is denial of service. After one particularly annoying episode, I realized that the vandal was trying to waste my time. So I setup the tarpit system to waste his, and haven't lost sleep since. I still do alot of cleanup on my wikis, and I still use Deny directives. Nothing replaces an active administrator. The tarpit just gave me another lever to help me manage the problem. Cheers, Patrick 1. I didn't labor to much over it, I just deleted the Delete link.