--Vr2UxLU0KdcKBaxP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Mark Hubbart (discord / mac.com) wrote:

>=20
> On Jul 20, 2004, at 10:07 AM, Greg Millam wrote:
>=20
> >Robert Oschler wrote:
> >
> >>Because of the Google Page Rank land grab, there are web sites
> >>running scripts to deface popular Wikis with links to their site.
> >>For a dramatic example look at the Revision history page for the
> >>Ruby Garden Wiki:
> >>
> >>http://www.rubygarden.org/ruby?RecentChanges
> >>
> >>The problem is, even though, we diligently delete the spam as it
> >>shows up, most Wikis archive the old revisions in a revision list.
> >>Google (you) crawls these revision list pages and finds the deleted
> >>spam links. In fact, you find a lot of them because the spammers
> >>keep coming back and we keep deleting them, creating lots of
> >>revision history pages that you crawl.
> >>
> >>Here's a VERY SIMPLE way for you to help out the thousands of Wikis
> >>out there.
> >
> >robots.txt ?
> >
> >Google adheres to that very strongly. and I notice there's no=20
> >http://www.rubygarden.org/robots.txt
> >
> >http://www.google.com/webmasters/faq.html#norobots
>=20
> Glancing at the specs, it seems that the benefits of someone posting=20
> external links could be removed by a combination of wise robots.txt=20
> settings and a redirect page for external links. Or, one could use the=20
> meta tags that do the same thing:
>=20
> <META NAME=3D"ROBOTS" CONTENT=3D"NOFOLLOW">

<META NAME=3D"ROBOTS" CONTENT=3D"NOFOLLOW, NOINDEX">

Should be inserted into every page with any additional query arguments
beyond the page name.  These pages do nothing other than give the search
engine more work to do with zero benefit, and cost rubygarden.org money
to be browsed.

> This should keep any compliant search engine (including Google) from=20
> analyzing a page for links. Which should prevent the pageranking.
>=20
> If, however, some external links should be respected, there's the=20
> redirect trick. External links go to a page which redirects to the=20
> link. That way, you can allow certain urls (links to rubycentral,=20
> ruby-lang, etc.) to be read, but links to unknown sites could be=20
> filtered out, by placing meta tags correctly.

--=20
Eric Hodel - drbrain / segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E  7C11 332A 551C 796C 9F04


--Vr2UxLU0KdcKBaxP
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (FreeBSD)

iD8DBQFA/YI7MypVHHlsnwQRAkCiAKDStxCUbMYO9h+TNu62lo2KzyJ8+gCfZ4nT
0pW+x5RwIjJiPJ1jDcvXQaI=
=KllR
-----END PGP SIGNATURE-----

--Vr2UxLU0KdcKBaxP--

Mark Hubbart (discord / mac.com) wrote:

>=20
> On Jul 20, 2004, at 10:07 AM, Greg Millam wrote:
>=20
> >Robert Oschler wrote:
> >
> >>Because of the Google Page Rank land grab, there are web sites
> >>running scripts to deface popular Wikis with links to their site.
> >>For a dramatic example look at the Revision history page for the
> >>Ruby Garden Wiki:
> >>
> >>http://www.rubygarden.org/ruby?RecentChanges
> >>
> >>The problem is, even though, we diligently delete the spam as it
> >>shows up, most Wikis archive the old revisions in a revision list.
> >>Google (you) crawls these revision list pages and finds the deleted
> >>spam links. In fact, you find a lot of them because the spammers
> >>keep coming back and we keep deleting them, creating lots of
> >>revision history pages that you crawl.
> >>
> >>Here's a VERY SIMPLE way for you to help out the thousands of Wikis
> >>out there.
> >
> >robots.txt ?
> >
> >Google adheres to that very strongly. and I notice there's no=20
> >http://www.rubygarden.org/robots.txt
> >
> >http://www.google.com/webmasters/faq.html#norobots
>=20
> Glancing at the specs, it seems that the benefits of someone posting=20
> external links could be removed by a combination of wise robots.txt=20
> settings and a redirect page for external links. Or, one could use the=20
> meta tags that do the same thing:
>=20
> <META NAME=3D"ROBOTS" CONTENT=3D"NOFOLLOW">

<META NAME=3D"ROBOTS" CONTENT=3D"NOFOLLOW, NOINDEX">

Should be inserted into every page with any additional query arguments
beyond the page name.  These pages do nothing other than give the search
engine more work to do with zero benefit, and cost rubygarden.org money
to be browsed.

> This should keep any compliant search engine (including Google) from=20
> analyzing a page for links. Which should prevent the pageranking.
>=20
> If, however, some external links should be respected, there's the=20
> redirect trick. External links go to a page which redirects to the=20
> link. That way, you can allow certain urls (links to rubycentral,=20
> ruby-lang, etc.) to be read, but links to unknown sites could be=20
> filtered out, by placing meta tags correctly.

--=20
Eric Hodel - drbrain / segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E  7C11 332A 551C 796C 9F04

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (FreeBSD)

iD8DBQFA/YI7MypVHHlsnwQRAkCiAKDStxCUbMYO9h+TNu62lo2KzyJ8+gCfZ4nT
0pW+x5RwIjJiPJ1jDcvXQaI=
=KllR
-----END PGP SIGNATURE-----