keywords

Capitalisation in Search

Continuing our series of frequently asked questions, this article looks at capitalisation with regards to SEO, common problems, and how the search engines handle capitalised keywords.

The specific question we were asked was "What impact do the use of capitals have on search engine results pages (SERPs), if any?"

This particular question is often asked in relation to town and location names, such as the English town of Reading in Berkshire, which can be swapped with the word ‘reading’. I will address this specific question to begin with before taking a broader look at capitalisation in Search.

All of the major search engines are case insensitive.  That is to say whether you type [BOX], [Box] or [box] as a search query, it doesn’t matter, as you are more than likely to get the same results. So from an SEO point of view the best practice is to optimise your page so that it is grammatically correct, as you would any other typed document.  As we always recommend you should write for the user and not search engine spiders.

One place where letters written in different cases can be an issue is within URLs, which are in fact case sensitive according to the HTTP specifications. Case sensitivity affects everything after the domain, which is case insensitive, i.e. whether you have http://www.example-url.com/ or http://www.Example-Url.com/ doesn’t really matter, as this is only used by DNS to find the web server address.  What does matter is what you have after the domain, as different cases will indicate requests for different files. For example, http://www.example-url.com/Folder-Name/, http://www.example-url.com/FOLDER-NAME/ and http://www.example-url.com/folder-name/ are all different URLs and are treated as such by the search engines.

If all three versions of the above URL existed, it could lead to them being identified as duplicate content and there is a good chance that this will dilute the page’s link equity. For this reason, as well as to promote uniformity in order to make the process of creating URLs more straightforward, the recommended best practice here is to stick with lower case for all URLs. As an aside, lower case URLs are considered more aesthetically pleasing and are easier to read.

Case sensitive issues tend to arise if you use a server which is case insensitive, such as Microsoft IIS. With a Microsoft IIS server, the three URLs above would be treated as the same URL. Again the best practice here is to stick to using lower case in your URLs.

However, there are occasions when Google does return different results depending on the case used. This seems to be mainly where the letters could be either a word or an acronym. Compare [BAR],[bar] and [Bar] for example.  The results produced are split into three sections, and it is in the third section where we found differences.

Comparing search results of BAR, bar and Bar

Differences were also seen when comparing results for [AND],[and] and [And].

Another oddity that came to light was seen when searching for [MAD] and [mad]. For [MAD] Google returns a currency exchange rate one box but not for [mad].

Therefore a best practice for including acronyms on a page is to include the full form with the acronym in brackets, at least in the first mention, as Google often highlights this in the search snippet.

Tags: , , , , , , ,

4 comments Add This

Will purchasing a branded TLD improve your SEO?

As ICANN continues to move forward with its plans to permit the creation of unlimited new generic Top Level Domains (gTLDs), what does this mean for SEO? What are the benefits and downsides of branded gTLDs, and are they commercially viable?

ICANN, the Internet Corporation for Assigned Names and Numbers, which is a not-for-profit organisation tasked with maintaining the Internet registry of domain names and IP addresses, is permitting the registration of new generic Top Level Domains (gTLDs) in addition to existing gTLDs such as ‘.com’, ‘.org’ and ‘.mobi’. In addition to this, ICANN is also currently in the process of expanding the domain name system to include Internationalised Domain Names for non-Latin languages.

ICANN has stated that any entity meeting the following basic registration requirements can apply for the creation of a new gTLD:

  1. String reviews (concerning the applied-for gTLD string). String reviews include a determination that the applied-for gTLD string is not likely to cause security or stability problems in the DNS, including problems caused by similarity to existing TLDs or reserved names.
  2. Applicant reviews (concerning the entity applying for the gTLD and its proposed registry services). Applicant reviews include a determination of whether or not the applicant has the requisite technical, operational, and financial capability to operate a registry.

Applicants will also need to pass checks to ensure that there are no objections to registration, such as cases of public decency (as in the case of .xxx), or legal or commercial objections, such as trademarks. Applicants will also be required to pay ICANN a setup fee of $185,000 (£120,000), a move which has caused some parties to express concerns that gTLDs are simply a cynical money-grabbing exercise.

So what are the benefits of these new gTLDs?

TLDs are all about differentiation. In the same way that a ‘.com’ domain can be thought of as a global site, and a ‘.co.uk’ domain can be considered to be a UK site, these new gTLDs demark specific portions of the Internet, and can be used for various purposes. For example:

  • Sites relating to a particular geographical area, such as ‘.nyc’ for New York City, or ‘.SW1’ for the applicable South-West London postal area.
  • Special interest groups without geographical boundaries can have their own space on the Internet, making them easier to identify and associate with.
  • Companies can cement their brands on the Internet, creating second tier domains for sections of their organisations on a branded gTLDs.
  • Innovative service offerings involving profiles on domains such as ‘.facebook’.
  • Inexpensive, price differentiated domain hosting, on domains such as ‘.mysite’.
  • Sub-sections of the Internet dedicated to specific content, such as ‘.music’, where commercially available tracks can be advertised, or ‘.appstore’ where you can host your iPhone apps.

Recently, Canon announced its application to register ‘.canon’ as a gTLD. What stood out on Canon’s Press Release was its reason for registering the domain:

With the adoption of the new gTLD system, which enables the direct utilization of the Canon brand, Canon hopes to globally integrate open communication policies that are intuitive and easier to remember compared with existing domain names such as "canon.com."

What strikes me about this is that ‘canon.com’ is already pretty easy to remember. I imagine that rather than ‘canon.canon’, subdomains such as ‘corporate.canon’, ‘sales.canon’ and ‘printers.canon’ would be the objective here, although there would at first appear to be little benefit to this.

As Canon is a multi-billion pound company, I suspect that this may have been purchased as a knee-jerk, defensive measure, even if a smart employee did not come up with a good reason as to why it should be purchased.

As it stands, there are likely to be few organisations that can justify the expenditure associated with registering a gTLD, especially given the superficial benefits of the domain name (aside from the potential uses identified in this article). Many companies are already worried about the cost of registering their domains on different TLDs, but do so defensively in case someone else registers them. With a potentially infinite number of new gTLDs coming on the market, these costs are only going to increase.

What do new gTLDs mean for search engines?

As is the case today (with some exceptions), relocating a site from one gTLD to another is unlikely to have many benefits SEO-wise, and often has a negative impact on rankings in the short term. It is also impossible at this stage to tell how the search engines will treat these new gTLDs.

For commercial product sites, moving from an existing domain which is performing well to a new domain is inadvisable due to the link profile and reputation gained over time, not all of which may be transferred to the new domain, at least not right away.

From a monetisation perspective, having the keyword in the TLD will provide far less benefit than having a well optimised site with a strong natural link structure, something which cannot simply be bought off the shelf. The value of reselling domains which are relevant to a particular group of commercial organisations by offering a unique differentiator as per points made earlier in this article is, of course, one method of monetisation, but it would certainly be a brave business model.

In summary, registering a gTLD is only really feasible in a situation where you can justify the expense through clear strategic planning and/or your business cannot afford to have its brand diluted. This will be more important where a company does not own a registered trademark, as an objection is less likely to be upheld should anyone try to create a gTLD of their brand name.

It will be very interesting to see how Canon will use its new asset, assuming that its application is successful. Any other organisation which decides to register a new gTLD will also make an interesting case study for more widespread uptake of this novel branding opportunity.

Tags: , , , , ,

0 comments Share

Can search engines handle Internationalized Domain Names (IDNs)?

Internationalized Domain Names (IDNs) have been approved by ICANN and are set to become a reality. Are the search engines prepared for them?

Skip to start of post


Introduction

Note: If you are unable to view the Chinese and Arabic letters in this page you may need to install the required fonts.

In October 2009, ICANN voted to allow the use of non-ASCII characters in domain names. Non-ASCII characters have existed within domain names for a while – for example, many Hong Kong sites feature Chinese characters (example: http://香港儒釋道院.組織.hk). However, before now, these characters were not allowed within TLDs and, as such, URLs still required ASCII characters (in the example above, the ccTLD “.hk”).

ICANN launched the IDN ccTLD Fast Track Process in November, and last month announced that four top-level IDNs had successfully passed the initial stage of approval (three Arabic-language IDN ccTLDs for Egypt, Saudi Arabia and the United Arab Emirates, and one Cyrillic-language IDN ccTLD for the Russian Federation). At the time of writing, there are another 13 IDN ccTLDs on their way through this process, representing 10 different languages in total.

In order to provide the Internet community time to prepare for the rollout of new IDN domains, ICANN has set up a number of IDN domains for testing purposes. Each of these test domains is written as “example.test” in it’s respective language, and content has been made available to view on each site.

Seeing as most of the initial IDN ccTLDs are likely to be in Arabic, I have used ICANN’s test Arabic domain (مثال.إختبار) for my research.

Before I start, I need to quickly explain what Punycode is, as it it used to support the addition of IDN domains to the existing Internet infrastructure. The problem with the current system is that the Domain Name System (DNS) only allows certain ASCII characters, which means that it is not possible to simply add Unicode characters to it. Punycode was invented to get around this issue. Essentially, it is a method by which Unicode characters can be translated to (and from) the ASCII characters allowed within the DNS. When your browser requests a domain name containing Unicode characters, it converts it to the ASCII-formatted Punycode before sending the request.

For this experiment, I have looked at the way in which the search engines handle both the Unicode form of the Arabic domain (http://مثال.إختبار/) as well as the corresponding Punycode format (which, in this case, is http://xn--mgbh0fb.xn--kgbechtv/). Note that, because Arabic is an RTL (right-to-left) language, pages on this site will have the URL path to the left of the hostname, rather than to the right.

One last note before we look at the results – the test page does not feature a meta description tag, so any snippet text is likely to come from text within the page itself.

Here are the results.

Google

Searching Google for the Unicode variant of the URL returns the homepage of the domain as the first result, with an additional nested result for a second, internal page on the domain:

  Google Unicode

Initially, everything seems to be in place here. The title tags, snippets and URLs are correctly displayed in Arabic, and Google has highlighted the search text in bold as usual. Additionally, the “Similar” pages link works, and the “jump to” successfully takes you to an anchor within the page. Lastly, the URL path is written in the correct RTL form for the second result.

However, not all is well. The first URL that Google is listing, the homepage, is actually a 301 redirect to an internal page. Google should be indexing the destination page, not the redirecting homepage.

There are several other issues too. Firstly, the cached copy link did not work:

Google cached copy

I tried a number of pages on the site and Google’s cached copy did not work for any of them, so Google may have an issue with this feature at present.

Additionally, the “Translate this page” links for both results do not correctly function, and an error message is shown:

Google Translate error

Side note – the “See original page” link does correctly point to the Arabic domain name.

Next I tried searching Google for the Punycode form of the URL:

Google Punycode

Google has returned the same two URLs, which is a good sign of consistency. The title tags are the same, and the URL is still written in Arabic and not displayed in the Punycode form.

This time around, Google has picked out some text on the page which matches the Punycode search term. Although this particular snippet is rather less attractive than the ones from the previous query, matching the exact text on a page is probably the best approach. However, it would also make sense for Google to at least highlight the Unicode version (for example, in the URL), which it currently does not do.

Again, while the “Similar” pages link works, the “Cached” and “Translate this page” links are broken. This seems to be an issue that Google needs to fix.

Yahoo!

Searching Yahoo! for the Unicode or Punycode version of the URL does not return any results from the domain:

Yahoo! Arabic domain fail

Similarly, entering the URLs within Yahoo! Site Explorer simply redirects back to the main Yahoo!
search results. Performing “site:” searches (for either variant) also fails (looking at the HTTP headers, you can see that Yahoo! actually redirects the query to Site Explorer, which then redirects you back to the standard web search results).

I tried a few additional ICANN test IDN domains in other languages and none of them worked. Yahoo! seems to fail completely at handling IDNs.

Given that Yahoo! is likely to use Bing’s search in the future, let’s see how Bing performs next.

Bing

Searching Bing for the Unicode version of the URL does return a page from the site, although it’s at position 8, which is not ideal (when searching for a URL you would usually want the URL to appear at or near the top of the search results). The snippet appears as follows:

Bing Unicode

Only one URL is shown, which isn’t quite as useful as Google’s result, but is still adequate. The title tag, snippet and URL are all correctly shown in Arabic, which is good. The “Translate this page” and “Cached page” links both work, whilst they didn’t on Google.

Bing does have some issues, however. Although Bing has indexed the destination URL (the link goes directly to the destination URL), for some reason Bing only displays the URL of the homepage in the snippet. Additionally, although Bing has highlighted the domain in bold in the snippet, it has not highlighted it within the URL.

Bing does have a number of problems with its handling of this domain. However, they are fairly minor and definitely less important than the issues that Google has with this site.

Searching Bing for the Punycode version of the URL, Bing returns the URL at position 2 instead, which is a bit better:

Bing Punycode

Again, like Google, Bing has picked out the text from the page which matches the query for the snippet but has not highlighted the Arabic equivalent in the snippet. Otherwise, this result is much the same as the Unicode search variant.

Ask Jeeves

I have also looked at Ask Jeeves (known as just “Ask” in the US).

Searching Ask Jeeves for the Unicode version of the URL returns the site at position one. Like Google, it includes a second indented URL at position two. Interestingly, these are the same two URLs that Google returned for this search (it is worth remembering that Ask Jeeves might be using Google’s results at times).

Ask Jeeves Unicode

Ask Jeeves is correctly displaying both the title and the snippet in Arabic, but the URL is written in the Punycode form instead, which is clearly far from ideal.

There is another major issue with Ask Jeeves’ implementation – the second URL goes through a redirect, but the hostname given by the redirect has been encoded in a way which makes Firefox and Internet Explorer fail to load the page (Google Chrome and Opera did successfully load the page from the redirect). Note: This does not always happen – reloading the page sometimes returns the URL without the redirect, and in this case it works correctly.

Searching Ask Jeeves for the Punycode version of the URL results in much the same as we have seen earlier. Again, the snippet includes the text from the page which matches the query. Ask Jeeves includes a small screenshot of the page too:

Ask Jeeves Punycode

Ask Jeeves’ binoculars feature, which displays a small thumbnail screenshot of the site, does appear to work correctly. However, it is possible that there are issues here as well.

Ask Jeeves Binoculars

Although it’s difficult to make out due to the small size of the thumbnail, it appears that the English text renders correctly but the Arabic text (although correctly displayed in an RTL fashion) looks like it might be showing a nonsense placeholder character, in the same way that web browsers which do not render Unicode characters do. That said, it is difficult to determine for sure from the small thumbnail that Ask Jeeves provides.

Conclusion

In conclusion, Google, Bing and Ask Jeeves do support IDNs to varying degrees. If I had to proclaim a winner at the moment, I would say that Bing had a slight lead, but all of these search engines had some issues. Hopefully these issues will be ironed out by the time that IDNs eventually roll out en-masse.

Yahoo! appears to completely fail to support IDNs at present. Once it switches to Bing’s search engine, however, we assume that it will inherit all of Bing’s IDN support as well.

Tags: , , , , , , , , ,

2 comments Share

Mirror launches 3am Girls site.

As print newspapers change their attitude towards online content delivery, we take a look at one of the classier of the latest batch of web offerings.

On from the launch of the relatively impressive if not entirely unique Mirror Football website earlier this month, recently launched is the digital version of the “famous3am Girls- Trinity Mirror’s latest attempt at a vertical for which they possibly hope to charge in the foreseeable future in order to help stave off the UK’s largest newspaper publisher’s plummeting share price avoid laying off more journalists and closing down more newspapers.

http://www.3am.co.uk/

What can we say about the SEO of this site by looking at it for 2 minutes? The URL structure looks ok, they seem to have a hierarchical system that uses hyphen to separate words. But I can’t say the actual words they want Google to spider are too impressive. I am not sure what they will make of “Ooh”, “Gasp!” and “Phwaor!” as the links on the main navigation. All the page titles are the same as well and there is no RSS feed, but I don’t want to be too picky.

Does it have any meta data then? What are those CTRs going to be like?

Let’s Google [3am] … here they are down at number 6.

Google result for [3am]

Well, I don’t know about you but to me the snippet’s not exactly an incentive to learn more. But we all know newspaper companies hate Google so maybe they’re not interested in traffic from search engines, which might start 80% of internet journeys but let’s not let facts get in the way of the truth.

Oh but hold on. Trinity are paying for PPC rankings for both [3am] and [celebrity gossip] so they are at least acknowledging that search exists in some form. Oh dear.

To be fair, it is early days for this site. With a decent amount of marketing more people will come and visit what is an established brand in the celebrity world and as a result the site will attract some high quality links that will push it up the rankings to a point, despite Trinity making it as hard as possible for Google to understand what the site is about.

But if they want to rank for [celebrity] (450,000 exact match searches on average per month) or [celebrity gossip] (368,000 exact match searches on average per month), which I am pretty sure they do as they are bidding on PPC for both, and compete with Heat, Perez Hilton, Spike and *whisper it* The Sun then they had better smarten up their act. Because currently they are, sensibly, not charging for content so all cash will come from ad revenue which is reliant on traffic and impressions and as far as Google, the biggest traffic driver of them all, is concerned they are merely a blip on the horizon.

Tags: , , ,

0 comments Share

How has Google Suggest affected search queries?

Google Suggest and search refinements were introduced at the end of March this year. How have they impacted the shape of search?

Note: This article focuses on the UK search market.

The folks over at Latitude have written up an interesting piece looking at whether or not the introduction of Google Suggest in the UK has resulted in any changes to the volumes of searches for the search terms which are suggested.

The expected effect would be an increase in suggested search terms as they get more exposure, along with a corresponding decrease in more generic short tail searches and possibly a drop in the long tail as well. Additionally, this should also result in fewer searches for mis-spelled queries, as Google Suggest can correct spelling mistakes on the fly.

The changes expected from the introduction of Google Suggest are very much in line with the type of changes introduced by Google Search Refinements – in fact, Google launched these two new features only a week apart. Therefore, it is likely that the changes in overall search patterns reflect a combination of these two changes.

Let’s have a look at some of the findings. Some examples are really compelling, such as the huge increases in searches for [pet insurance comparison]:

Google Insights - pet/travel insurance (latitude)
[credit - image from referenced post]

Many queries recommended by Google Suggest have experienced a rise since April, although these growth rates have generally been much lower than the rate for the search term [pet insurance comparison] shown above.

However, the picture isn’t completely consistent across the board. The example of [car insurance compare] isn’t entirely accurate, given the way that Google Insights for Search matches queries. Most people are searching for variants in the phrased form “compare car insurance” (the queries for which correlate well to the set of searches containing any of these words. However, queries including the phrase “car insurance compare” are much lower than this, and volumes are relatively flat.

Google Insights - car insurance compare

Note: A side effect of this quirk with Google Insights for Search is that it’s not really possible to determine accurate relative search volumes for more generic terms (which we would expect to have declined slightly) as any generic query entered will also match long-tail variants, including those provided by Google Suggest. You can get some mileage with negative keywords in some instances (here is an example, although this isn’t related to the introduction of Google Suggest) but in general this isn’t suitable.

One area in which the introduction of Google Suggest is expected to have an impact in reducing a particular type of query is with mis-spellings – unlike Google’s other technologies for providing search refinements and spelling corrections, where only after a mis-spelled query has been made is the opportunity is given to correct it (potentially resulting in a second, correctly-spelled query), Google Suggest can prevent mis-spellings from ever happening at all. Here’s an example of a common mis-spelling of the UK’s most popular search term in 2008:

Google Insights - face book

It’s very difficult to pin down a particular trend to one specific change made by Google. There are constantly many changes being made to Google’s search results – 359 last year alone. Some of the experimental features which Google runs never see use beyond a small test group, whilst others are seen by so many people that they’re common knowledge in the SEO industry before they are officially announced. The dates that Google provides for product launches are only rough indications of the dates by which the majority of users will have had access to a feature.

It is also important to note that different features can have related effects – for example, Google’s search refinements launched just a week before Google Suggest came to the UK and both would be likely to have a very similar impact (although I would expect that Google Suggest has a greater impact here than search refinements, due to its relative prominence as a feature).

Side note: The spelling suggestions example above is another good example of need for care with attributing a particular impact on search results on a specific change by Google – around the same time that Google made the changes described above, they also made some changes to spelling correction.

Nevertheless, the data does seem to indicate that the introduction of Google Suggest and search refinements has had the expected effect on search patterns. To round up, I’ll quote the summary of my previous article on search refinements:

As with any change in Google there are winners and losers. Searchers will be more likely to use a wider variety of search queries, meaning that the number of potential visitors will be spread out more evenly across multiple queries. As different websites will rank for different terms, this may result in a “spreading out” of visitors across a greater number of different web sites.

Sites that ranked well for high traffic terms might potentially see a drop in traffic, but the increase in search precision from these more targeted phrases should hopefully mean that searchers are directed to the pages on your site that are most relevant to what they are looking for. That, I think we can all agree, is good for everyone.

Tags: , , , ,

0 comments Share