Canonical Tag – Mostly Harmless

Several months ago the canonical tag was announced and seen as a solution to the issue of duplicate content.

Now, you can simply add this tag to specify your preferred version:

<link rel=”canonical” href=”http://www.example.com/product.php?item=swedish-fish” />

inside the <head> section of the duplicate content URLs:

http://www.example.com/product.php?item=swedish-fish&category=gummy-candy
http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678

and Google will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. Additional URL properties, like PageRank and related signals, are transferred as well.
Google Webmaster Central

I have a client and their shopping basket was being reported in Google 293,000 times. Each link to the basket had options, and within the cart there were remove links. All of these got crawled and added to the index.

The canonical tag seems the natural solution. So on the 12th of Novemember it was added to the cart page. This morning though there are still 265,000 instances of the page indexed.

I wrote a bash script to see how many visits the Googlebot had paid the cart’s page, and since the 12th of November it’s visited 20,765 times. 293,000 less 20,765 is 272,235 which is very broadly 265,000 I suppose.

Google seem to be removing 2300 odd pages a day, but it will still take four months at the current rate to rid the site of the duplicate content issue using the canonical tag alone.

See also:

3 thoughts on “Canonical Tag – Mostly Harmless

  1. I would No index the shopping cart, that would remove all the pages and can’t see it hurting SEO as no one will really want to land on the shopping cart page anyway

  2. Many thanks for your comment Adam.

    I had considered adding “noindex, follow” but the issue would be it would still take months for each individual iteration of the URL to get noindexed.

    Furthermore I think it would spend the site’s juice on hundreds of thousands of “pages” as although Google should not index them, Google would still think they existed.

    Each page even if it doesn’t display in the index would still eat part of the whole site’s link equity.

    The page linking would spend it’s pagerank but to no gain. I’m thinking that using canonical will allow the cart page to pool page rank which is then redistributed via navigational links to important areas of the website.

    At the time of writing there are currently 231,000 “pages” left in the index. Google have dropped 62,000 of them and at the current rate should have them all by mid March.

Comments are closed.