Canonical Tag – Mostly Harmless

Several months ago the canonical tag was announced and seen as a solution to the issue of duplicate content.

Now, you can simply add this tag to specify your preferred version:

<link rel=”canonical” href=”http://www.example.com/product.php?item=swedish-fish” />

inside the <head> section of the duplicate content URLs:

http://www.example.com/product.php?item=swedish-fish&category=gummy-candy
http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678

and Google will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. Additional URL properties, like PageRank and related signals, are transferred as well.
Google Webmaster Central

I have a client and their shopping basket was being reported in Google 293,000 times. Each link to the basket had options, and within the cart there were remove links. All of these got crawled and added to the index.

The canonical tag seems the natural solution. So on the 12th of Novemember it was added to the cart page. This morning though there are still 265,000 instances of the page indexed.

I wrote a bash script to see how many visits the Googlebot had paid the cart’s page, and since the 12th of November it’s visited 20,765 times. 293,000 less 20,765 is 272,235 which is very broadly 265,000 I suppose.

Google seem to be removing 2300 odd pages a day, but it will still take four months at the current rate to rid the site of the duplicate content issue using the canonical tag alone.

Canonical Tag – Mostly Harmless

3 thoughts on “Canonical Tag – Mostly Harmless”

Recent Posts

Categories