redirects

Practical uses for the new Google cross-domain canonical link element

The cross-domain canonical link element, albeit only currently supported by Google, is a welcome addition to the webmaster’s toolkit. Read on for practical examples of how you can use it in your SEO campaigns.

Google is one step ahead of Bing and Yahoo! in allowing the canonical link element to be applied across domains, and we expect the other search engines to follow suit in due course. This is particularly important as, where websites may have previously used the canonical link element within a site and are now pointing to another site, not only do Yahoo!/Bing not canonicalise cross-domain, but they’ll also lose the existing canonical reference, which makes things even worse for them.

For now, this is the closest thing to a permanent redirect in Google for where users can’t implement a 301 redirect for whatever reason, and will come as welcome news to some. However, we need to remember that this is not a guaranteed outcome as Google explained in its post:

“While the rel="canonical" link element is seen as a hint and not an absolute directive, we do try to follow it where possible.”

All of the previous uses of the canonical link element are still valid – however, this opens up a number of new potential uses:

  • You can now move your site to a new domain even when you don’t have control of server headers (such as on free hosts like Google-owned Blogger).
  • As a temporary measure before 301 redirects can be properly implemented.
  • Landing pages on domains registered for tracking offline campaigns can pass the benefit of any links back to the main domain.
  • It will be possible to allow affiliates to create affiliate web sites which not only won’t compete against your website in the search results, but will even help the rankings of your own site (although this can’t be guaranteed). Obviously, this is something that the affiliates will have to agree to, and won’t be suitable for all programmes.
  • Similar to the above, it will allow for syndicating content out to third parties in a way which won’t threaten to compete against your site for rankings, and might also help your site to rank better. It’s quite possible that this will lead to changes in the market for syndicated content, with prices potentially dropping (or even free) for syndicated content which uses this element. Again, this is something that partners will have to agree beforehand.

Google even touched on the above possibility, but it seems that (for the time being at least) it has decided to make this optional – in Google’s blog post announcing this new feature, it says

“We leave this up to you and your publishers. If the content is similar enough, it might make sense to use rel="canonical", if both parties agree.”

Legacy systems, lack of technical know-how or internal policy all too often prohibit the changes required to improve a site’s rankings. Given the benefits of this new feature, I expect to see lots of creative uses to be dreamt up.

Let’s just hope that they are all designed with good intentions and that this does not become a target for misuse.

Tags: , , , ,

0 comments Add This

So what’s wrong with duplicate content?

There are very few occasions when duplicate content is a good thing. The search engines are not fond of it; it takes up unnecessary space in their indices and, in some cases, stops them from showing the right page.

There are very few occasions when duplicate content is a good thing. The search engines are not fond of it; it takes up unnecessary space in their indices and, in some cases, stops them from showing the right page. It should be made clear, however, that there is no such thing as a "duplicate content penalty", at least where Google is concerned.

Nonetheless, duplicate content is something that really should be avoided. It is possible that link authority could be split if people link to different duplicate pages. It can also skew any visitor tracking as people click on different copies that have made into the search results. It can be irritating for a user to click on a link and find content identical to something they have previously looked at. Having lots of duplicate pages on a site makes spidering less efficient, as search engine spiders will spend time downloading what is essentially the same page over and over again, rather than spidering other new or changed pages.

The problem is that there are many ways in which duplicate content can inadvertently be created. I’ll discuss just a few.

Perhaps the most common causes are from the way in which a site’s URLs are set up in the first place. A web server often has a root page and a page with a default document type. http://www.example.com/ would be the root, for example, and the page with the default document type would be http://www.example.com/index.html. The added default document type doesn’t have to be index.html. It could be any one of a number of things, including index.htm, index.php, index.asp. index.aspx, default.aspx, etc.

There are also situations in which a site can be found for the "non www." version (for example, http://example.com/) and the site has a duplicate https: version (for example, https://www.example.com/).

If a site had both the "non-www" and "https:" duplicate content issues there would be four copies of every page on the site, and if the issue affected a large site, the total number of duplicates would increase rapidly. Add printer-friendly pages and dynamic URLs to the possible causes and you can see that duplicate content can easily get out of hand.

Just as there are many ways in which duplicate content to be created, there are many ways in which the problem can be alleviated. The most obvious solution is not to create it in the first place – avoid session IDs in URLs, for example.

Another tried and trusted method of dealing with duplicate content, especially where the actual pages are identical, is the 301 redirect. This will prevent the site from displaying duplicate pages on different URLs.

Where the pages are not exactly identical, the rel=canonical tag can be used to indicate to the search engines that one particular page is the definitive version. As an aside, Google has recently announced that it will be adding cross-domain support for the canonical tag in the near future, and both Bing and Yahoo have said that they will be adding support for canonicalization across the same domain by the end of the year.

As mentioned in a recent blog post, Google now offers a parameter handling tool which can be used to help with duplicate content from dynamic URLs. It is probably best used if the solutions given above are not feasible and possibly only in situations where session IDs and similar issues are causing the problem. In fact, Google indicated in its official blog that there are situations in which using a rel="canonical" tag is wiser, especially as it is supported by many other search engines as well as by Google.

Tags: , , ,

0 comments Share

Google Parameter Handling tool

The usefulness of Google Webmaster Tools has just gone up another notch. Google has introduced a feature that allows a webmaster to suggest which URL parameters it should ignore.

The usefulness of Google Webmaster Tools has just gone up another notch. Google has introduced a feature that allows a webmaster to suggest which URL parameters it should ignore. So far, there has been no official announcement of the tools inclusion from Google, so detailed information is scarce.

Dynamic URLs can cause many duplicate content problems for a website, but with the Parameter Handling tool, a webmaster can indicate up to 15 parameters that Google should ignore.

The tool also displays a list of parameters that Googlebot has found, with a suggested action alongside (either "Ignore" or "Don’t ignore") which can edited as needed.

The point of the tool (which is, as yet, untested) is that by excluding parameters such as session IDs and tracking codes, it will in theory make the crawling of a site more efficient. In other words, Google’s spiders will not spend time following URLs that are essentially duplicates, which should hopefully mean more time spent spidering your more valuable pages.

Another effect of this is that (again, in theory) "link juice" will not be split across multiple duplicate URLs but will be consolidated onto the correct URL, much like the canonical link element. The number of duplicate pages should be reduced as well.

Yahoo! offers similar functionality in its Site Explorer service, but obviously each such tool will only work with each specific search engine. What would be nice here is some form of standard that all search engines would honour (in this case, perhaps an extension to the robots.txt protocol).

It should also be noted that Google has included an interesting caveat on the tool’s page – just like the canonical link element, Google says that it will treat requests to ignore certain URL parameters as suggestions only.

Tags: , , ,

1 comments Share

Defining the Canonical

Checking a dictionary will tell you that the adjective canonical comes from the noun canon, meaning a rule as in canon law especially pertaining to the Christian Church, authoritative, accurate and other similar meanings. This goes some way to explaining its use in the SEO industry.

Canonical in terms of SEO

Most commonly the term is used to describe the best URL choice. In other words the URL that you want the users and search engines to visit. Best practice is to inform the search engines which URL is the preferred one for a site, thus avoiding the search engines making the decision themselves or considering different URLs as separate pages.

Canonical URLs in practice

Many web sites find that they have a situation where multiple URLs all lead to the same page. This can be due to a number of factors but most commonly will be where both a non-www (for example http://example.co.uk/index.html) exists alongside the www version (for example http://www.example.co.uk/index.html). Both these URLs will typically point to the same page which can lead to duplicate content issues and split link equity.

Add to the above examples that many websites may also have duplicate pages as a result of the root domain (for example http://www.example.co.uk/) and default document (can be index.html, index.htm, index.asp, default.htm, default.html etc) both being present and added to this sometimes an HTTP and HTTPS version of the site. Follow that up with a .com domain with the same content and the scenario could end up having all of the following URLs pointing to the same page:

  • http://www.example.co.uk/
  • http://www.example.co.uk/index.html
  • https://www.example.co.uk/
  • https://www.example.co.uk/index.html
  • https://example.co.uk/
  • https://example.co.uk/index.html
  • http://example.co.uk
  • http://example.co.uk/index.html
  • http://www.example.com/
  • http://www.example.com/index.html
  • https://www.example.com/
  • https://www.example.com/index.html
  • https://example.com/
  • https://example.com/index.html
  • http://example.com
  • http://example.com/index.html

The good news is that there are many ways to remedy this problem, from the optimal 301 redirecting through robots meta tags to the new canonical link element, but with so many people using the term ‘canonical’ it is important to make sure everyone is singing from the same song sheet.

Tags: ,

2 comments Share

The canonical tag – A fantastic new tool to combat duplicate content

The top three search engines have jointly announced a new meta tag to help combat the issue of duplicate content. We look through the potential uses of this new tag, show the places where it shouldn’t be used and illustrate where it’s a fantastic new addition to the SEO toolbox.

A common problem for search engines when indexing websites is that of duplicate content. Having multiple pages with identical or very similar content can create numerous problems for the search engines, such as wasting resources on unnecessary spidering and attempting to determine which version of a page is most relevant. Search engines are interested in finding and indexing unique content, not hundreds of identical pages!

If your site contains multiple pages of duplicate content with little or no variations, this can lead to a number of potential issues. When confronted with multiple pages of duplicate content a search engine will attempt to identify a canonical page and then display this within the search results. This can lead to an undesired page being identified and chosen as preferential, rather than the page you might prefer.

An additional issue is that if links are made to these different pages, the benefits of these links might potentially be split between different page variants, as search engines are not always able to identify duplicate pages. The inherent value that comes from anchor texts and link weight, through both internal and external links to the page, will be diluted by having multiple pages with duplicate content.

Many websites, especially e-commerce sites, have multiple ways to navigate to individual items of content. This in turn leads to multiplication of pages with little or no variation. For example, here are some common variations of a site’s homepage:

  • http://example.com
  • http://www.example.com
  • https://www.example.com
  • http://www.example.com/index.html
  • http://www.example.com/?referer=page.html

Each of these URLs would show a visitor an identical copy of the site’s homepage.

Generally these sorts of duplicate URLs are dealt with by using 301 redirects to send visitors to the correct version of a page. However, there are some instances where you might actually want to have multiple pages which are very similar, and therefore a 301 redirect is not suitable. For example, on many e-commerce sites, you can often sort lists of products in numerous ways. In these instances, the Robots Exclusion Protocol is usually called upon to block these duplicate pages from the search engines.

Yesterday Google, Yahoo! and Microsoft jointly announced a new HTML tag in an effort to help site designers and search engines more accurately define a website’s canonical pages. The tag provides search engines with a suggestion from the site’s owner that a specific page should be considered as the canonical version and therefore more authoritative than its’ duplicate brothers. This new tag is as follows:

<link rel="canonical" href="http://example.com" />

This is inserted within the <head> element of duplicate pages and enables search engine spiders to accurately identify which URL is the site owner’s preferred canonical page. This new tag transfers search ‘signals’ such as Google PageRank, to the appropriate preferred canonical URL rather than dispersing it across multiple URLs.

This new tag should be considered as a new tool in the SEO toolbox, and does not necessarily mean that other methods of reducing duplicate content are no longer useful. In general, it is still going to be better to use 301 redirects in the majority of cases. Here are several reasons why:

  • This tag only provides a hint to search engines – they will consider it as part of their algorithm, but it is still by no means certain that a search engine will pick your choice. This will vary between the search engines, leading to differing behaviour from different search engines.
  • It requires pages to be identical or almost-identical. Exactly how “identical” will be down to each search engine to decide, which again leads to different results in different search engines, and it won’t work at all if the pages are significantly different.
  • It doesn’t work in any other search engines (at least yet). This is more important internationally, where different search engines may have different market shares, and local players may have a significant presence.
  • Search engines have to crawl the duplicate URLs, leading to increased server load and potentially reduced coverage of the rest of your site
  • It doesn’t work across different domains (although it does work across different subdomains on the same domain).

There are also instances of duplicate content where it may not be the most appropriate tool to use, for example:

  • Anywhere you would usually use a 301 redirect (see above for why), such as dealing with the HTTPS version of a site, the non-www site version or index.html pages. A 301 redirect is definitely the best solution in these situations.
  • Accessible text-only versions of a site – contrary to popular belief these are actually recommended against by the W3C, and should not be used at all. It is quite possible to make your site accessible while retaining all the bells and whistles
  • Printer-friendly pages – a better solution is to create CSS style sheets for print media, which will make any page on your site printable.
  • Duplicate pages caused by use of session IDs or referrer tracking in URLs – These cause spidering problems and should not be used. Additionally, as URLs may be shared, these are not reliable.

However, this tool definitely introduces a number of new ways of dealing with certain difficult types of duplicate content issues. Here are some great uses for this new tag:

  • As mentioned earlier, many e-commerce sites allow you to sort lists of products in numerous ways. In these instances, the Robots Exclusion Protocol is usually used to block these pages. However, it is now possible to use this tag to point to the canonical version of a page, thus effectively merging these URLs in the eyes of the search engines, whilst leaving the site experience unaltered.
  • In the same way as sorting, this is potentially a suitable tool to use with pagination, although if the pages are not similar enough the search engines may potentially fail to follow this directive. We would generally recommend allowing the search engines to spider at least one paginated form, however.
  • One common method of split testing involves using URL parameters to differentiate pages (although this isn’t the only way of doing it). Using this new tag allows you perform split testing in this way without causing duplicate content issues. A side note – if you’re not doing split testing, you should be!
  • Another issue common to e-commerce sites is multiple methods of navigating to a particular product. For example, a product may be included in a general category and also be accessible in a category for the product brand. In some instances a 301 redirect may be possible, but sometimes the branding and navigation may need to be kept intact, and this tag is a suitable tool in this instance.
  • Where you want to handle people using invalid URL parameters. In fact, you could argue that every single page on a site should use this new link tag for this reason alone. With this tag in place, you no longer have to worry about links to your site being made with weird URL parameters – they’ll all be consolidated for you!

A
final, if somewhat niche way of using this tag is where you are showing multiple revisions of a document. A good example of this is a wiki, where documents can be edited and each revision of a page is stored for posterity. In fact, Wikia was the partner for the search engines to help them test this new tag out.

The exact impact of this new HTML tag is yet to be accurately measured and, in general, it will still be much more beneficial to use a traditional 301 redirect. However, for those instances where a 301 is not the appropriate solution, the rel=canonical link tag provides an invaluable new addition to the SEO toolbox.


Update: Ask.com are also going to support the canonical tag.

Additional research by John Trivett

Tags: , , , ,

0 comments Share