On Jul 20, 2004, at 10:07 AM, Greg Millam wrote: > Robert Oschler wrote: > >> Because of the Google Page Rank land grab, there are web sites running >> scripts to deface popular Wikis with links to their site. For a >> dramatic >> example look at the Revision history page for the Ruby Garden Wiki: >> http://www.rubygarden.org/ruby?RecentChanges >> The problem is, even though, we diligently delete the spam as it >> shows up, >> most Wikis archive the old revisions in a revision list. Google (you) >> crawls >> these revision list pages and finds the deleted spam links. In fact, >> you >> find a lot of them because the spammers keep coming back and we keep >> deleting them, creating lots of revision history pages that you crawl. >> Here's a VERY SIMPLE way for you to help out the thousands of Wikis >> out >> there. > > robots.txt ? > > Google adheres to that very strongly. and I notice there's no > http://www.rubygarden.org/robots.txt > > http://www.google.com/webmasters/faq.html#norobots Glancing at the specs, it seems that the benefits of someone posting external links could be removed by a combination of wise robots.txt settings and a redirect page for external links. Or, one could use the meta tags that do the same thing: <META NAME="ROBOTS" CONTENT="NOFOLLOW"> This should keep any compliant search engine (including Google) from analyzing a page for links. Which should prevent the pageranking. If, however, some external links should be respected, there's the redirect trick. External links go to a page which redirects to the link. That way, you can allow certain urls (links to rubycentral, ruby-lang, etc.) to be read, but links to unknown sites could be filtered out, by placing meta tags correctly. cheers, Mark