canonical

Capitalisation in Search

Continuing our series of frequently asked questions, this article looks at capitalisation with regards to SEO, common problems, and how the search engines handle capitalised keywords.

The specific question we were asked was "What impact do the use of capitals have on search engine results pages (SERPs), if any?"

This particular question is often asked in relation to town and location names, such as the English town of Reading in Berkshire, which can be swapped with the word ‘reading’. I will address this specific question to begin with before taking a broader look at capitalisation in Search.

All of the major search engines are case insensitive.  That is to say whether you type [BOX], [Box] or [box] as a search query, it doesn’t matter, as you are more than likely to get the same results. So from an SEO point of view the best practice is to optimise your page so that it is grammatically correct, as you would any other typed document.  As we always recommend you should write for the user and not search engine spiders.

One place where letters written in different cases can be an issue is within URLs, which are in fact case sensitive according to the HTTP specifications. Case sensitivity affects everything after the domain, which is case insensitive, i.e. whether you have http://www.example-url.com/ or http://www.Example-Url.com/ doesn’t really matter, as this is only used by DNS to find the web server address.  What does matter is what you have after the domain, as different cases will indicate requests for different files. For example, http://www.example-url.com/Folder-Name/, http://www.example-url.com/FOLDER-NAME/ and http://www.example-url.com/folder-name/ are all different URLs and are treated as such by the search engines.

If all three versions of the above URL existed, it could lead to them being identified as duplicate content and there is a good chance that this will dilute the page’s link equity. For this reason, as well as to promote uniformity in order to make the process of creating URLs more straightforward, the recommended best practice here is to stick with lower case for all URLs. As an aside, lower case URLs are considered more aesthetically pleasing and are easier to read.

Case sensitive issues tend to arise if you use a server which is case insensitive, such as Microsoft IIS. With a Microsoft IIS server, the three URLs above would be treated as the same URL. Again the best practice here is to stick to using lower case in your URLs.

However, there are occasions when Google does return different results depending on the case used. This seems to be mainly where the letters could be either a word or an acronym. Compare [BAR],[bar] and [Bar] for example.  The results produced are split into three sections, and it is in the third section where we found differences.

Comparing search results of BAR, bar and Bar

Differences were also seen when comparing results for [AND],[and] and [And].

Another oddity that came to light was seen when searching for [MAD] and [mad]. For [MAD] Google returns a currency exchange rate one box but not for [mad].

Therefore a best practice for including acronyms on a page is to include the full form with the acronym in brackets, at least in the first mention, as Google often highlights this in the search snippet.

Tags: , , , , , , ,

4 comments Add This

Practical uses for the new Google cross-domain canonical link element

The cross-domain canonical link element, albeit only currently supported by Google, is a welcome addition to the webmaster’s toolkit. Read on for practical examples of how you can use it in your SEO campaigns.

Google is one step ahead of Bing and Yahoo! in allowing the canonical link element to be applied across domains, and we expect the other search engines to follow suit in due course. This is particularly important as, where websites may have previously used the canonical link element within a site and are now pointing to another site, not only do Yahoo!/Bing not canonicalise cross-domain, but they’ll also lose the existing canonical reference, which makes things even worse for them.

For now, this is the closest thing to a permanent redirect in Google for where users can’t implement a 301 redirect for whatever reason, and will come as welcome news to some. However, we need to remember that this is not a guaranteed outcome as Google explained in its post:

“While the rel="canonical" link element is seen as a hint and not an absolute directive, we do try to follow it where possible.”

All of the previous uses of the canonical link element are still valid – however, this opens up a number of new potential uses:

  • You can now move your site to a new domain even when you don’t have control of server headers (such as on free hosts like Google-owned Blogger).
  • As a temporary measure before 301 redirects can be properly implemented.
  • Landing pages on domains registered for tracking offline campaigns can pass the benefit of any links back to the main domain.
  • It will be possible to allow affiliates to create affiliate web sites which not only won’t compete against your website in the search results, but will even help the rankings of your own site (although this can’t be guaranteed). Obviously, this is something that the affiliates will have to agree to, and won’t be suitable for all programmes.
  • Similar to the above, it will allow for syndicating content out to third parties in a way which won’t threaten to compete against your site for rankings, and might also help your site to rank better. It’s quite possible that this will lead to changes in the market for syndicated content, with prices potentially dropping (or even free) for syndicated content which uses this element. Again, this is something that partners will have to agree beforehand.

Google even touched on the above possibility, but it seems that (for the time being at least) it has decided to make this optional – in Google’s blog post announcing this new feature, it says

“We leave this up to you and your publishers. If the content is similar enough, it might make sense to use rel="canonical", if both parties agree.”

Legacy systems, lack of technical know-how or internal policy all too often prohibit the changes required to improve a site’s rankings. Given the benefits of this new feature, I expect to see lots of creative uses to be dreamt up.

Let’s just hope that they are all designed with good intentions and that this does not become a target for misuse.

Tags: , , , ,

0 comments Share

So what’s wrong with duplicate content?

There are very few occasions when duplicate content is a good thing. The search engines are not fond of it; it takes up unnecessary space in their indices and, in some cases, stops them from showing the right page.

There are very few occasions when duplicate content is a good thing. The search engines are not fond of it; it takes up unnecessary space in their indices and, in some cases, stops them from showing the right page. It should be made clear, however, that there is no such thing as a "duplicate content penalty", at least where Google is concerned.

Nonetheless, duplicate content is something that really should be avoided. It is possible that link authority could be split if people link to different duplicate pages. It can also skew any visitor tracking as people click on different copies that have made into the search results. It can be irritating for a user to click on a link and find content identical to something they have previously looked at. Having lots of duplicate pages on a site makes spidering less efficient, as search engine spiders will spend time downloading what is essentially the same page over and over again, rather than spidering other new or changed pages.

The problem is that there are many ways in which duplicate content can inadvertently be created. I’ll discuss just a few.

Perhaps the most common causes are from the way in which a site’s URLs are set up in the first place. A web server often has a root page and a page with a default document type. http://www.example.com/ would be the root, for example, and the page with the default document type would be http://www.example.com/index.html. The added default document type doesn’t have to be index.html. It could be any one of a number of things, including index.htm, index.php, index.asp. index.aspx, default.aspx, etc.

There are also situations in which a site can be found for the "non www." version (for example, http://example.com/) and the site has a duplicate https: version (for example, https://www.example.com/).

If a site had both the "non-www" and "https:" duplicate content issues there would be four copies of every page on the site, and if the issue affected a large site, the total number of duplicates would increase rapidly. Add printer-friendly pages and dynamic URLs to the possible causes and you can see that duplicate content can easily get out of hand.

Just as there are many ways in which duplicate content to be created, there are many ways in which the problem can be alleviated. The most obvious solution is not to create it in the first place – avoid session IDs in URLs, for example.

Another tried and trusted method of dealing with duplicate content, especially where the actual pages are identical, is the 301 redirect. This will prevent the site from displaying duplicate pages on different URLs.

Where the pages are not exactly identical, the rel=canonical tag can be used to indicate to the search engines that one particular page is the definitive version. As an aside, Google has recently announced that it will be adding cross-domain support for the canonical tag in the near future, and both Bing and Yahoo have said that they will be adding support for canonicalization across the same domain by the end of the year.

As mentioned in a recent blog post, Google now offers a parameter handling tool which can be used to help with duplicate content from dynamic URLs. It is probably best used if the solutions given above are not feasible and possibly only in situations where session IDs and similar issues are causing the problem. In fact, Google indicated in its official blog that there are situations in which using a rel="canonical" tag is wiser, especially as it is supported by many other search engines as well as by Google.

Tags: , , ,

0 comments Share

Google Parameter Handling tool

The usefulness of Google Webmaster Tools has just gone up another notch. Google has introduced a feature that allows a webmaster to suggest which URL parameters it should ignore.

The usefulness of Google Webmaster Tools has just gone up another notch. Google has introduced a feature that allows a webmaster to suggest which URL parameters it should ignore. So far, there has been no official announcement of the tools inclusion from Google, so detailed information is scarce.

Dynamic URLs can cause many duplicate content problems for a website, but with the Parameter Handling tool, a webmaster can indicate up to 15 parameters that Google should ignore.

The tool also displays a list of parameters that Googlebot has found, with a suggested action alongside (either "Ignore" or "Don’t ignore") which can edited as needed.

The point of the tool (which is, as yet, untested) is that by excluding parameters such as session IDs and tracking codes, it will in theory make the crawling of a site more efficient. In other words, Google’s spiders will not spend time following URLs that are essentially duplicates, which should hopefully mean more time spent spidering your more valuable pages.

Another effect of this is that (again, in theory) "link juice" will not be split across multiple duplicate URLs but will be consolidated onto the correct URL, much like the canonical link element. The number of duplicate pages should be reduced as well.

Yahoo! offers similar functionality in its Site Explorer service, but obviously each such tool will only work with each specific search engine. What would be nice here is some form of standard that all search engines would honour (in this case, perhaps an extension to the robots.txt protocol).

It should also be noted that Google has included an interesting caveat on the tool’s page – just like the canonical link element, Google says that it will treat requests to ignore certain URL parameters as suggestions only.

Tags: , , ,

1 comments Share

Defining the Canonical

Checking a dictionary will tell you that the adjective canonical comes from the noun canon, meaning a rule as in canon law especially pertaining to the Christian Church, authoritative, accurate and other similar meanings. This goes some way to explaining its use in the SEO industry.

Canonical in terms of SEO

Most commonly the term is used to describe the best URL choice. In other words the URL that you want the users and search engines to visit. Best practice is to inform the search engines which URL is the preferred one for a site, thus avoiding the search engines making the decision themselves or considering different URLs as separate pages.

Canonical URLs in practice

Many web sites find that they have a situation where multiple URLs all lead to the same page. This can be due to a number of factors but most commonly will be where both a non-www (for example http://example.co.uk/index.html) exists alongside the www version (for example http://www.example.co.uk/index.html). Both these URLs will typically point to the same page which can lead to duplicate content issues and split link equity.

Add to the above examples that many websites may also have duplicate pages as a result of the root domain (for example http://www.example.co.uk/) and default document (can be index.html, index.htm, index.asp, default.htm, default.html etc) both being present and added to this sometimes an HTTP and HTTPS version of the site. Follow that up with a .com domain with the same content and the scenario could end up having all of the following URLs pointing to the same page:

  • http://www.example.co.uk/
  • http://www.example.co.uk/index.html
  • https://www.example.co.uk/
  • https://www.example.co.uk/index.html
  • https://example.co.uk/
  • https://example.co.uk/index.html
  • http://example.co.uk
  • http://example.co.uk/index.html
  • http://www.example.com/
  • http://www.example.com/index.html
  • https://www.example.com/
  • https://www.example.com/index.html
  • https://example.com/
  • https://example.com/index.html
  • http://example.com
  • http://example.com/index.html

The good news is that there are many ways to remedy this problem, from the optimal 301 redirecting through robots meta tags to the new canonical link element, but with so many people using the term ‘canonical’ it is important to make sure everyone is singing from the same song sheet.

Tags: ,

2 comments Share