duplicate content

Capitalisation in Search

Continuing our series of frequently asked questions, this article looks at capitalisation with regards to SEO, common problems, and how the search engines handle capitalised keywords.

The specific question we were asked was "What impact do the use of capitals have on search engine results pages (SERPs), if any?"

This particular question is often asked in relation to town and location names, such as the English town of Reading in Berkshire, which can be swapped with the word ‘reading’. I will address this specific question to begin with before taking a broader look at capitalisation in Search.

All of the major search engines are case insensitive.  That is to say whether you type [BOX], [Box] or [box] as a search query, it doesn’t matter, as you are more than likely to get the same results. So from an SEO point of view the best practice is to optimise your page so that it is grammatically correct, as you would any other typed document.  As we always recommend you should write for the user and not search engine spiders.

One place where letters written in different cases can be an issue is within URLs, which are in fact case sensitive according to the HTTP specifications. Case sensitivity affects everything after the domain, which is case insensitive, i.e. whether you have http://www.example-url.com/ or http://www.Example-Url.com/ doesn’t really matter, as this is only used by DNS to find the web server address.  What does matter is what you have after the domain, as different cases will indicate requests for different files. For example, http://www.example-url.com/Folder-Name/, http://www.example-url.com/FOLDER-NAME/ and http://www.example-url.com/folder-name/ are all different URLs and are treated as such by the search engines.

If all three versions of the above URL existed, it could lead to them being identified as duplicate content and there is a good chance that this will dilute the page’s link equity. For this reason, as well as to promote uniformity in order to make the process of creating URLs more straightforward, the recommended best practice here is to stick with lower case for all URLs. As an aside, lower case URLs are considered more aesthetically pleasing and are easier to read.

Case sensitive issues tend to arise if you use a server which is case insensitive, such as Microsoft IIS. With a Microsoft IIS server, the three URLs above would be treated as the same URL. Again the best practice here is to stick to using lower case in your URLs.

However, there are occasions when Google does return different results depending on the case used. This seems to be mainly where the letters could be either a word or an acronym. Compare [BAR],[bar] and [Bar] for example.  The results produced are split into three sections, and it is in the third section where we found differences.

Comparing search results of BAR, bar and Bar

Differences were also seen when comparing results for [AND],[and] and [And].

Another oddity that came to light was seen when searching for [MAD] and [mad]. For [MAD] Google returns a currency exchange rate one box but not for [mad].

Therefore a best practice for including acronyms on a page is to include the full form with the acronym in brackets, at least in the first mention, as Google often highlights this in the search snippet.

Tags: , , , , , , ,

4 comments Add This

So what’s wrong with duplicate content?

There are very few occasions when duplicate content is a good thing. The search engines are not fond of it; it takes up unnecessary space in their indices and, in some cases, stops them from showing the right page.

There are very few occasions when duplicate content is a good thing. The search engines are not fond of it; it takes up unnecessary space in their indices and, in some cases, stops them from showing the right page. It should be made clear, however, that there is no such thing as a "duplicate content penalty", at least where Google is concerned.

Nonetheless, duplicate content is something that really should be avoided. It is possible that link authority could be split if people link to different duplicate pages. It can also skew any visitor tracking as people click on different copies that have made into the search results. It can be irritating for a user to click on a link and find content identical to something they have previously looked at. Having lots of duplicate pages on a site makes spidering less efficient, as search engine spiders will spend time downloading what is essentially the same page over and over again, rather than spidering other new or changed pages.

The problem is that there are many ways in which duplicate content can inadvertently be created. I’ll discuss just a few.

Perhaps the most common causes are from the way in which a site’s URLs are set up in the first place. A web server often has a root page and a page with a default document type. http://www.example.com/ would be the root, for example, and the page with the default document type would be http://www.example.com/index.html. The added default document type doesn’t have to be index.html. It could be any one of a number of things, including index.htm, index.php, index.asp. index.aspx, default.aspx, etc.

There are also situations in which a site can be found for the "non www." version (for example, http://example.com/) and the site has a duplicate https: version (for example, https://www.example.com/).

If a site had both the "non-www" and "https:" duplicate content issues there would be four copies of every page on the site, and if the issue affected a large site, the total number of duplicates would increase rapidly. Add printer-friendly pages and dynamic URLs to the possible causes and you can see that duplicate content can easily get out of hand.

Just as there are many ways in which duplicate content to be created, there are many ways in which the problem can be alleviated. The most obvious solution is not to create it in the first place – avoid session IDs in URLs, for example.

Another tried and trusted method of dealing with duplicate content, especially where the actual pages are identical, is the 301 redirect. This will prevent the site from displaying duplicate pages on different URLs.

Where the pages are not exactly identical, the rel=canonical tag can be used to indicate to the search engines that one particular page is the definitive version. As an aside, Google has recently announced that it will be adding cross-domain support for the canonical tag in the near future, and both Bing and Yahoo have said that they will be adding support for canonicalization across the same domain by the end of the year.

As mentioned in a recent blog post, Google now offers a parameter handling tool which can be used to help with duplicate content from dynamic URLs. It is probably best used if the solutions given above are not feasible and possibly only in situations where session IDs and similar issues are causing the problem. In fact, Google indicated in its official blog that there are situations in which using a rel="canonical" tag is wiser, especially as it is supported by many other search engines as well as by Google.

Tags: , , ,

0 comments Share