language

Can search engines handle Internationalized Domain Names (IDNs)?

Internationalized Domain Names (IDNs) have been approved by ICANN and are set to become a reality. Are the search engines prepared for them?

Skip to start of post


Introduction

Note: If you are unable to view the Chinese and Arabic letters in this page you may need to install the required fonts.

In October 2009, ICANN voted to allow the use of non-ASCII characters in domain names. Non-ASCII characters have existed within domain names for a while – for example, many Hong Kong sites feature Chinese characters (example: http://香港儒釋道院.組織.hk). However, before now, these characters were not allowed within TLDs and, as such, URLs still required ASCII characters (in the example above, the ccTLD “.hk”).

ICANN launched the IDN ccTLD Fast Track Process in November, and last month announced that four top-level IDNs had successfully passed the initial stage of approval (three Arabic-language IDN ccTLDs for Egypt, Saudi Arabia and the United Arab Emirates, and one Cyrillic-language IDN ccTLD for the Russian Federation). At the time of writing, there are another 13 IDN ccTLDs on their way through this process, representing 10 different languages in total.

In order to provide the Internet community time to prepare for the rollout of new IDN domains, ICANN has set up a number of IDN domains for testing purposes. Each of these test domains is written as “example.test” in it’s respective language, and content has been made available to view on each site.

Seeing as most of the initial IDN ccTLDs are likely to be in Arabic, I have used ICANN’s test Arabic domain (مثال.إختبار) for my research.

Before I start, I need to quickly explain what Punycode is, as it it used to support the addition of IDN domains to the existing Internet infrastructure. The problem with the current system is that the Domain Name System (DNS) only allows certain ASCII characters, which means that it is not possible to simply add Unicode characters to it. Punycode was invented to get around this issue. Essentially, it is a method by which Unicode characters can be translated to (and from) the ASCII characters allowed within the DNS. When your browser requests a domain name containing Unicode characters, it converts it to the ASCII-formatted Punycode before sending the request.

For this experiment, I have looked at the way in which the search engines handle both the Unicode form of the Arabic domain (http://مثال.إختبار/) as well as the corresponding Punycode format (which, in this case, is http://xn--mgbh0fb.xn--kgbechtv/). Note that, because Arabic is an RTL (right-to-left) language, pages on this site will have the URL path to the left of the hostname, rather than to the right.

One last note before we look at the results – the test page does not feature a meta description tag, so any snippet text is likely to come from text within the page itself.

Here are the results.

Google

Searching Google for the Unicode variant of the URL returns the homepage of the domain as the first result, with an additional nested result for a second, internal page on the domain:

  Google Unicode

Initially, everything seems to be in place here. The title tags, snippets and URLs are correctly displayed in Arabic, and Google has highlighted the search text in bold as usual. Additionally, the “Similar” pages link works, and the “jump to” successfully takes you to an anchor within the page. Lastly, the URL path is written in the correct RTL form for the second result.

However, not all is well. The first URL that Google is listing, the homepage, is actually a 301 redirect to an internal page. Google should be indexing the destination page, not the redirecting homepage.

There are several other issues too. Firstly, the cached copy link did not work:

Google cached copy

I tried a number of pages on the site and Google’s cached copy did not work for any of them, so Google may have an issue with this feature at present.

Additionally, the “Translate this page” links for both results do not correctly function, and an error message is shown:

Google Translate error

Side note – the “See original page” link does correctly point to the Arabic domain name.

Next I tried searching Google for the Punycode form of the URL:

Google Punycode

Google has returned the same two URLs, which is a good sign of consistency. The title tags are the same, and the URL is still written in Arabic and not displayed in the Punycode form.

This time around, Google has picked out some text on the page which matches the Punycode search term. Although this particular snippet is rather less attractive than the ones from the previous query, matching the exact text on a page is probably the best approach. However, it would also make sense for Google to at least highlight the Unicode version (for example, in the URL), which it currently does not do.

Again, while the “Similar” pages link works, the “Cached” and “Translate this page” links are broken. This seems to be an issue that Google needs to fix.

Yahoo!

Searching Yahoo! for the Unicode or Punycode version of the URL does not return any results from the domain:

Yahoo! Arabic domain fail

Similarly, entering the URLs within Yahoo! Site Explorer simply redirects back to the main Yahoo!
search results. Performing “site:” searches (for either variant) also fails (looking at the HTTP headers, you can see that Yahoo! actually redirects the query to Site Explorer, which then redirects you back to the standard web search results).

I tried a few additional ICANN test IDN domains in other languages and none of them worked. Yahoo! seems to fail completely at handling IDNs.

Given that Yahoo! is likely to use Bing’s search in the future, let’s see how Bing performs next.

Bing

Searching Bing for the Unicode version of the URL does return a page from the site, although it’s at position 8, which is not ideal (when searching for a URL you would usually want the URL to appear at or near the top of the search results). The snippet appears as follows:

Bing Unicode

Only one URL is shown, which isn’t quite as useful as Google’s result, but is still adequate. The title tag, snippet and URL are all correctly shown in Arabic, which is good. The “Translate this page” and “Cached page” links both work, whilst they didn’t on Google.

Bing does have some issues, however. Although Bing has indexed the destination URL (the link goes directly to the destination URL), for some reason Bing only displays the URL of the homepage in the snippet. Additionally, although Bing has highlighted the domain in bold in the snippet, it has not highlighted it within the URL.

Bing does have a number of problems with its handling of this domain. However, they are fairly minor and definitely less important than the issues that Google has with this site.

Searching Bing for the Punycode version of the URL, Bing returns the URL at position 2 instead, which is a bit better:

Bing Punycode

Again, like Google, Bing has picked out the text from the page which matches the query for the snippet but has not highlighted the Arabic equivalent in the snippet. Otherwise, this result is much the same as the Unicode search variant.

Ask Jeeves

I have also looked at Ask Jeeves (known as just “Ask” in the US).

Searching Ask Jeeves for the Unicode version of the URL returns the site at position one. Like Google, it includes a second indented URL at position two. Interestingly, these are the same two URLs that Google returned for this search (it is worth remembering that Ask Jeeves might be using Google’s results at times).

Ask Jeeves Unicode

Ask Jeeves is correctly displaying both the title and the snippet in Arabic, but the URL is written in the Punycode form instead, which is clearly far from ideal.

There is another major issue with Ask Jeeves’ implementation – the second URL goes through a redirect, but the hostname given by the redirect has been encoded in a way which makes Firefox and Internet Explorer fail to load the page (Google Chrome and Opera did successfully load the page from the redirect). Note: This does not always happen – reloading the page sometimes returns the URL without the redirect, and in this case it works correctly.

Searching Ask Jeeves for the Punycode version of the URL results in much the same as we have seen earlier. Again, the snippet includes the text from the page which matches the query. Ask Jeeves includes a small screenshot of the page too:

Ask Jeeves Punycode

Ask Jeeves’ binoculars feature, which displays a small thumbnail screenshot of the site, does appear to work correctly. However, it is possible that there are issues here as well.

Ask Jeeves Binoculars

Although it’s difficult to make out due to the small size of the thumbnail, it appears that the English text renders correctly but the Arabic text (although correctly displayed in an RTL fashion) looks like it might be showing a nonsense placeholder character, in the same way that web browsers which do not render Unicode characters do. That said, it is difficult to determine for sure from the small thumbnail that Ask Jeeves provides.

Conclusion

In conclusion, Google, Bing and Ask Jeeves do support IDNs to varying degrees. If I had to proclaim a winner at the moment, I would say that Bing had a slight lead, but all of these search engines had some issues. Hopefully these issues will be ironed out by the time that IDNs eventually roll out en-masse.

Yahoo! appears to completely fail to support IDNs at present. Once it switches to Bing’s search engine, however, we assume that it will inherit all of Bing’s IDN support as well.

Tags: , , , , , , , , ,

2 comments Add This

How has Google Suggest affected search queries?

Google Suggest and search refinements were introduced at the end of March this year. How have they impacted the shape of search?

Note: This article focuses on the UK search market.

The folks over at Latitude have written up an interesting piece looking at whether or not the introduction of Google Suggest in the UK has resulted in any changes to the volumes of searches for the search terms which are suggested.

The expected effect would be an increase in suggested search terms as they get more exposure, along with a corresponding decrease in more generic short tail searches and possibly a drop in the long tail as well. Additionally, this should also result in fewer searches for mis-spelled queries, as Google Suggest can correct spelling mistakes on the fly.

The changes expected from the introduction of Google Suggest are very much in line with the type of changes introduced by Google Search Refinements – in fact, Google launched these two new features only a week apart. Therefore, it is likely that the changes in overall search patterns reflect a combination of these two changes.

Let’s have a look at some of the findings. Some examples are really compelling, such as the huge increases in searches for [pet insurance comparison]:

Google Insights - pet/travel insurance (latitude)
[credit - image from referenced post]

Many queries recommended by Google Suggest have experienced a rise since April, although these growth rates have generally been much lower than the rate for the search term [pet insurance comparison] shown above.

However, the picture isn’t completely consistent across the board. The example of [car insurance compare] isn’t entirely accurate, given the way that Google Insights for Search matches queries. Most people are searching for variants in the phrased form “compare car insurance” (the queries for which correlate well to the set of searches containing any of these words. However, queries including the phrase “car insurance compare” are much lower than this, and volumes are relatively flat.

Google Insights - car insurance compare

Note: A side effect of this quirk with Google Insights for Search is that it’s not really possible to determine accurate relative search volumes for more generic terms (which we would expect to have declined slightly) as any generic query entered will also match long-tail variants, including those provided by Google Suggest. You can get some mileage with negative keywords in some instances (here is an example, although this isn’t related to the introduction of Google Suggest) but in general this isn’t suitable.

One area in which the introduction of Google Suggest is expected to have an impact in reducing a particular type of query is with mis-spellings – unlike Google’s other technologies for providing search refinements and spelling corrections, where only after a mis-spelled query has been made is the opportunity is given to correct it (potentially resulting in a second, correctly-spelled query), Google Suggest can prevent mis-spellings from ever happening at all. Here’s an example of a common mis-spelling of the UK’s most popular search term in 2008:

Google Insights - face book

It’s very difficult to pin down a particular trend to one specific change made by Google. There are constantly many changes being made to Google’s search results – 359 last year alone. Some of the experimental features which Google runs never see use beyond a small test group, whilst others are seen by so many people that they’re common knowledge in the SEO industry before they are officially announced. The dates that Google provides for product launches are only rough indications of the dates by which the majority of users will have had access to a feature.

It is also important to note that different features can have related effects – for example, Google’s search refinements launched just a week before Google Suggest came to the UK and both would be likely to have a very similar impact (although I would expect that Google Suggest has a greater impact here than search refinements, due to its relative prominence as a feature).

Side note: The spelling suggestions example above is another good example of need for care with attributing a particular impact on search results on a specific change by Google – around the same time that Google made the changes described above, they also made some changes to spelling correction.

Nevertheless, the data does seem to indicate that the introduction of Google Suggest and search refinements has had the expected effect on search patterns. To round up, I’ll quote the summary of my previous article on search refinements:

As with any change in Google there are winners and losers. Searchers will be more likely to use a wider variety of search queries, meaning that the number of potential visitors will be spread out more evenly across multiple queries. As different websites will rank for different terms, this may result in a “spreading out” of visitors across a greater number of different web sites.

Sites that ranked well for high traffic terms might potentially see a drop in traffic, but the increase in search precision from these more targeted phrases should hopefully mean that searchers are directed to the pages on your site that are most relevant to what they are looking for. That, I think we can all agree, is good for everyone.

Tags: , , , ,

0 comments Share

Google’s new search refinements – how will it affect you?

Google has announced two changes to how it displays its search results pages. These changes have been rolled out across 37 languages worldwide. In this post, we explore what the changes to their “search refinements” means to webmasters.

Google has recently made changes to their “search refinements” feature. These are the links that Google includes at the bottom (and sometimes at the top) of their search results pages which provide a number of suggestions for a searcher to narrow their search down.

Google has introduced new technology that can “better understand associations and concepts related to your search”. Aside from leading to changes in the search refinements which they display, they have also increased the number of these that they may show.

For example, when searching for [

Image courtesy of Google.

How might this affect you?

Firstly, as the algorithm powering these search refinements has changed, this means that many of the suggested searches will also have changed. If your site used to get traffic from people clicking on these, it may no longer do so as this particular suggestion may no longer be listed. Conversely, your site may suddenly start to receive traffic from new search refinements.

Secondly, as the number of search refinements has increased, this means that the chance of clicking on any particular one will have decreased.

Thirdly, a combination of an increased number of search refinements and (if we believe Google) an increase in the relevance of these suggestions will likely lead to an increase in the number of searches who use them. This is obviously good for sites which rank well for the suggested searches of high-traffic queries, particularly when these refinements are listed at the top of the search page rather than the bottom. It will also obviously lead to an increase in longer-tail searches.

However, this is not all good news – for every site gaining visitors, another site has to lose them. It is likely that the number of searchers clicking through to page 2 and beyond will decrease as searchers use these links instead, so sites which rank on page 2 for high volume queries (which can still drive a fairly significant amount of traffic for top terms) will likely see a decrease in their traffic for these terms.

The location of the search refinements on the search results page for a particular query will also affect their impact on traffic. Where the search refinements are included at the bottom of a results page it may distract visitors away from site sites ranking just above it, but where Google places them at the top of the results page the impact could potentially be much greater – searchers may click directly onto one of these search suggestions rather than looking through any of the top 10 sites they searched for.

The action that webmasters need to take from this is simple – in addition to the keywords that site owners should already be targeting, they also need to look at the most important search refinements. Look at the most popular searches in your industry niche and look at the search refinements that Google provides. These are keywords that you might want to target next.

After this simple keyword research step all of the usual keyword suggestions apply as per normal (include the most important words in your title tag, try to include the words in the same order, etc). Where Google places the search refinements at the top of the page rather than the bottom, these should be given higher priority as they will likely drive a far larger percentage of traffic than where they are included at the bottom of the page.

How will this change the shape of search?

  • The short tail – Searchers will be encouraged away from the more common shorter-tail queries, so the short-tail will likely shrink to an extent.
  • The middle tail – The number of refinements that will be suggested is limited – therefore, we foresee a fattening of what might be termed the “middle-tail”, that is, queries which are not huge traffic drivers but which are still searched for on a regular basis.

As with any change in Google there are winners and losers. Searchers will be more likely to use a wider variety of search queries, meaning that the number of potential visitors will be spread out more evenly across multiple queries. As different websites will rank for different terms, this may result in a “spreading out” of visitors across a greater number of different web sites.

Sites that ranked well for high traffic terms might potentially see a drop in traffic, but the increase in search precision from these more targeted phrases should hopefully mean that searchers are directed to the pages on your site that are most relevant to what they are looking for. That, I think we can all agree, is good for everyone.

Tags: , , , ,

1 comments Share

Thinking outside la boîte

One of the most common misconceptions that I deal with is keywords, particularly where travel and related services are involved. If you seek international success it is absolutely vital to think outside the confines of this country.

You have probably done extensive research into the most important destinations relevant to your business and you may even have extended this research beyond the UK to your target countries. So all done, yes?

No! What you need to do now is think yourself into each of your target countries and find out what destinations are most popular there. I know that a sizeable proportion of UK travel related services involves business travellers moving around within this country. It is reasonable, therefore, to assume that in other countries the same applies. While your top destination on the UK site may be London it is surely not reasonable to assume that Monsieur le web searcher in Paris is going to London. He may well be a business traveller on a trip to Toulouse or somewhere else within France. Similarly one of your top French destinations from the UK may be Disneyland Paris, but will the same hold true for native French search?

If you really want international success then you need to think internationally. Part of this thinking is that every country has a unique domestic market as well as a unique way of searching. Acquiring this knowledge is painstaking and sometimes quite difficult but it really is worth the effort. That extra bit of understanding can be the edge that you need in getting your product or service that critical position in the search results. The benefits of good keyword research for PPC campaigns are, I trust, self explanatory.

Tags: ,

0 comments Share

Don’t think "Foreign", think "International"

Recently I was presented with a claim that 70% of web searches are not in English.

It would probably be more accurate to say that some 70% of web users worldwide do not speak English as their mother tongue but this is still a powerful argument for the benefits of International SEO.

What is International SEO?

Let’s start with what International SEO is not. It is not just grabbing your UK website, translating the whole thing into your target language or languages and then chucking the result onto a country-appropriate TLD. If all you’re trying to do is to address potential non-English speakers who may be interested in your UK site, this approach may have some merit and you don’t really need to bother about a country-appropriate domain registration. However, that is multi-lingual targeting, not International SEO.

Get a country appropriate domain – nothing says “Germany” to a search engine more authoritatively than domain.de. Currently, certain countries will only permit registration of local domains if the registrant has a physical presence within that country. This is not generally a problem for multi-national corporations but can present a challenge for smaller businesses. Recent statistics suggest that some 14% of non-English searchers outside of the UK & US use the “Pages From [country]” option provided by the major search engines.

Know the market place – Certain products and services are completely inappropriate in some countries for religious, social or indeed legal reasons. This can be quite a granular issue and may not affect your whole offering, which is just one reason why you should avoid any form of machine translation or delivery system that simply spits out a "translated" version of your website depending on which country the content is being served to. Make sure that you know what, if any, age restrictions may apply to any part of your offering. Don’t even consider a straight translation of any legal pages. There is a very good chance that an enforceable UK contract would be tossed out by an overseas court and the flip-side of that is that you may find conditions being imposed on you that you hadn’t bargained for.

Don’t translate, create – This is the big one. Translating your English copy, however well it’s done, confers a “little brother” status on your international venture. Whilst, in strictly accounting terms, your international offering may be less important than your “home-grown” pages, giving your international sites a reduced status will not help them to change the balance. Know the local market place and create copy that’s right for it and for the localised search space. Bear in mind that surfing and searching habits vary markedly between countries, and your copy and its presentation need to reflect this. Don’t forget to include your generic content in this process (About, Contact, Legals etc). Remember that your page titles and meta descriptions need to be part of this process, as these extremely important SEO factors are often overlooked.

English as well? – Quite possibly yes, and for a variety of reasons. Many people consider English to be the language of international business and would expect to be able to view an English language version of say, a share dealing site, although they may be much happier if they can conduct research in their native language prior to engaging with any transaction. Then there is the greatly increased mobility of workforce which means that you need to accommodate English speakers who may be living and working in your target country. Additionally, you should bear in mind that many countries, particularly in Western Europe, teach English as a compulsory subject in school and there is an increasing population of at least bilingual surfers out there. The English content is, self-evidently, something of a no-brainer (ensure that legal stuff is accurate for the target country) but make sure that you’re not missing any opportunities to maximise your audience by offering any alternative language versions of your pages that are appropriate.

Don’t just spell check, proof read - Spell checkers are great, aren’t they? Well, up to a point. Yes, they will help you to trap the more obvious and embarrassing errors but they don’t help much if a misspelling actually makes another valid word. How about a grammar/semantics checker then? Do not put your faith in these things. They are limited and often not very accurate. Remember that this is your brand that we’re talking about here, so give your online presence the same care and consideration that you’d give to hard copy brochures and other material. In other words, proof read everything. Spelling and grammatical errors not only create a negative brand message but can adversely affect search engine rankings and performance.

HTML Entities & you – If you’re not familiar with these things, find out about them. The purpose of HTML Entities is to ensure a fairly universal interpretation of the special characters used in many languages. For example, Ü should be correctly rendered by the majority of browsers as Ü. Random question marks or other characters printed because the browser doesn’t know what to render look awful and hardly present the image that you’re striving for. If you operate on a Content Management System (CMS), it’s worth seeing if this can be enforced at system level. By the same token, if the CMS says that it works in a certain language, make sure that it’s doing this properly.

And to prove they matter - Be certain that special characters are present when required. A long standing joke in the translation industry concerns a town in the USA that allegedly used the word ‘ano’ rather than ‘año’ in a Spanish translation for a millennial celebration. While this may not look too bad to a casual observer, ‘año’ means ‘year’ while unfortunately ‘ano’ means ‘anus’. “Fiesta del ano 2000” might have been seen as a little too specialised for many people!

Getting the money – If your site, or a part of it, is intended for eCommerce, bear in mind that not all countries have bought into the age of plastic as thoroughly as the UK & the US have. You may need to research alternative payment methods like PayPal, or possibly completely change your approach in the target country and investigate “Buy From” style links with bricks & mortar retail outlets. If this is an issue, it will obviously modify your ambitions for the site but it shouldn’t halt your push for local online brand recognition.

Directories still count – Many people dismiss web directories as irrelevant and there is a certain validity to this view: by and large, the Internet has moved on. However, all of the major search engines continue to crawl directories and if you have a good entry and link in one or more of them it can’t hurt you and could speed up the early recognition of your site. The real trick is to sort the wheat from the chaff and not to throw your site at any directory just because it’s there. Yahoo! & DMOZ are both multinational, so have a look in the appropriate country section to see if you can get your site in there. Even if your UK site is already in there, don’t worry – you’re not spamming the directory by adding your site. Assuming that you’ve checked at least most of the areas that I’ve covered, this is a unique site that you’re adding and it lives on a unique domain. Research local language directories, particularly niche directories that fit your product or service, and see if any of them are worth getting into. Local knowledge is a big plus here, so make use of any contacts that you may have on the ground in your target country.

Web 2.0 is important – Like it or not, this is
the age of social media and virtually everybody is able to voice their opinion, pleasure and displeasure online. The great thing is that you can tap into this market and let it help you. If, for instance, you want to set up a South Korean site, are you aware that not only is this the most connected country on the planet but that they also really like social media? Blogs, social media sites and social bookmarking can have a tremendous effect on your back linking and thus your rankings. If you have the personnel to operate it, consider setting up a company blog and make sure that it’s kept fresh & up to date with content that’s appropriate for your key demographic. Blogs are also a great way to get multimedia material that you may not want on your core site out to visitors. This can create a lot of positive noise around your brand and there is always the possibility that one of these items could go viral. Of course, people can also create negative noise, so monitor your social media presence and be carefully reactive if bad noise starts.

Tags: ,

0 comments Share