language

Search engines still struggling with Internationalized Domain Names (IDNs)

Internationalized Domain Names (IDNs) are now a reality and in use by websites right now. Unfortunately, it seems that the search engines are still playing catch-up.

Skip to start of post

Introduction

Note: If you are unable to view the Arabic and Cyrillic letters in this page you may need to install the required fonts.

Now that the first Internationalized Domain Names (IDNs) have gone live and have had some time to get established, it seems like a good time to revisit the finding of my previous article on IDNs “Can search engines handle Internationalized Domain Names (IDNs)?

IDNs went live initially for three countries, all using the Arabic alphabet: Egypt (مصر); Saudi Arabia (السعودية); and the United Arab Emirates (امارات). Russia’s new IDN (рф) went live a little later, adding the Cyrillic alphabet to the mix, and additional IDNs have been created for other countries and alphabets. For this article I’ll take a look at how search engines handle these four IDNs.

To get an idea of how extensively the search engines have indexed sites on these new IDNs I’m going to use the “site:” operator. Although this operator is primarily used for finding pages on a particular website, e.g. [site:lbi.co.uk] it can also work all the way up to the TLD level, e.g. [site:uk].

Searching Google for [site:مصر], [site: السعودية], [site:امارات] and [site:рф] returns results from the IDNs  for Egypt, Saudi Arabia, the UAE and Russia as expected. Whilst the new IDN for Saudi Arabia only had 14 pages indexed when checked, the other IDNs all feature thousands of results.

Screenshot of a search for [site:рф] in Google:

Google search for site:рф

Trying the same searches in Bing, however, does not return any results:

Bing search for site:рф

It appears that the site: operator does not work with these new IDNs in Bing (searching for other domains, e.g. [site:com], works as expected).

IDNs in search results?

The next area tested is whether the search engines will return these domains in their search results. To test this I picked out some random web pages on the new Egyptian IDN and tried searching for their title tags in both Google and Bing.

Searching both Google and Bing for the title of one web page, [مراكز التميز في البحث والتطوير - وزارة الإتصالات], brought up a number of web pages. The results from Google and Bing both contained a result from an IDN:

Google snippet featuring an IDN:

Google snippet featuring an Arabic IDN

Bing snippet featuring an IDN:

Bing snippet featuring an Arabic IDN

More IDN bugs

Earlier I described how Bing’s site: operator does not yet work with IDNs. However, Google also has a number of IDN woes. Searching for [site:مصر] (the new IDN for Egypt) brings up the site سجل.مصر – however, clicking on the “Show more results from سجل.مصر” link in Google appears to be listing sites on domains other than سجل.مصر. Additionally, the “Show all results” link is percent encoded rather than listing the site name in the Arabic font.

Screenshot of Google IDN bug

In my previous look at how search engines handled IDNs I had found that Google’s links to “Translate this page” and “Cached” were broken for IDNs. Today it appears that Google has fixed the translation links – however, the cache links still do not appear to function.

Conclusion

The situation is much the same as it was back in February. The search engines can index websites which use IDNs – however, all of the major search engines still have bugs with their IDN support.

Given that the number of IDNs is set to grow and the number of websites using IDNs is likely to vastly increase in the near future, it’s vital that the search engines iron out the bugs in their IDN support. After all, if a search engine can’t handle websites from a particular properly, people might decide to switch to a search engine that can.

Tags: , , , , , , , ,

0 comments Add This

Can search engines handle Internationalized Domain Names (IDNs)?

Internationalized Domain Names (IDNs) have been approved by ICANN and are set to become a reality. Are the search engines prepared for them?

Skip to start of post

Introduction

Note: If you are unable to view the Chinese and Arabic letters in this page you may need to install the required fonts.

In October 2009, ICANN voted to allow the use of non-ASCII characters in domain names. Non-ASCII characters have existed within domain names for a while – for example, many Hong Kong sites feature Chinese characters (example: http://香港儒釋道院.組織.hk). However, before now, these characters were not allowed within TLDs and, as such, URLs still required ASCII characters (in the example above, the ccTLD “.hk”).

ICANN launched the IDN ccTLD Fast Track Process in November, and last month announced that four top-level IDNs had successfully passed the initial stage of approval (three Arabic-language IDN ccTLDs for Egypt, Saudi Arabia and the United Arab Emirates, and one Cyrillic-language IDN ccTLD for the Russian Federation). At the time of writing, there are another 13 IDN ccTLDs on their way through this process, representing 10 different languages in total.

In order to provide the Internet community time to prepare for the rollout of new IDN domains, ICANN has set up a number of IDN domains for testing purposes. Each of these test domains is written as “example.test” in it’s respective language, and content has been made available to view on each site.

Seeing as most of the initial IDN ccTLDs are likely to be in Arabic, I have used ICANN’s test Arabic domain (مثال.إختبار) for my research.

Before I start, I need to quickly explain what Punycode is, as it it used to support the addition of IDN domains to the existing Internet infrastructure. The problem with the current system is that the Domain Name System (DNS) only allows certain ASCII characters, which means that it is not possible to simply add Unicode characters to it. Punycode was invented to get around this issue. Essentially, it is a method by which Unicode characters can be translated to (and from) the ASCII characters allowed within the DNS. When your browser requests a domain name containing Unicode characters, it converts it to the ASCII-formatted Punycode before sending the request.

For this experiment, I have looked at the way in which the search engines handle both the Unicode form of the Arabic domain (http://مثال.إختبار/) as well as the corresponding Punycode format (which, in this case, is http://xn--mgbh0fb.xn--kgbechtv/). Note that, because Arabic is an RTL (right-to-left) language, pages on this site will have the URL path to the left of the hostname, rather than to the right.

One last note before we look at the results – the test page does not feature a meta description tag, so any snippet text is likely to come from text within the page itself.

Here are the results.

Google

Searching Google for the Unicode variant of the URL returns the homepage of the domain as the first result, with an additional nested result for a second, internal page on the domain:

  Google Unicode

Initially, everything seems to be in place here. The title tags, snippets and URLs are correctly displayed in Arabic, and Google has highlighted the search text in bold as usual. Additionally, the “Similar” pages link works, and the “jump to” successfully takes you to an anchor within the page. Lastly, the URL path is written in the correct RTL form for the second result.

However, not all is well. The first URL that Google is listing, the homepage, is actually a 301 redirect to an internal page. Google should be indexing the destination page, not the redirecting homepage.

There are several other issues too. Firstly, the cached copy link did not work:

Google cached copy

I tried a number of pages on the site and Google’s cached copy did not work for any of them, so Google may have an issue with this feature at present.

Additionally, the “Translate this page” links for both results do not correctly function, and an error message is shown:

Google Translate error

Side note – the “See original page” link does correctly point to the Arabic domain name.

Next I tried searching Google for the Punycode form of the URL:

Google Punycode

Google has returned the same two URLs, which is a good sign of consistency. The title tags are the same, and the URL is still written in Arabic and not displayed in the Punycode form.

This time around, Google has picked out some text on the page which matches the Punycode search term. Although this particular snippet is rather less attractive than the ones from the previous query, matching the exact text on a page is probably the best approach. However, it would also make sense for Google to at least highlight the Unicode version (for example, in the URL), which it currently does not do.

Again, while the “Similar” pages link works, the “Cached” and “Translate this page” links are broken. This seems to be an issue that Google needs to fix.

Yahoo!

Searching Yahoo! for the Unicode or Punycode version of the URL does not return any results from the domain:

Yahoo! Arabic domain fail

Similarly, entering the URLs within Yahoo! Site Explorer simply redirects back to the main Yahoo!
search results. Performing “site:” searches (for either variant) also fails (looking at the HTTP headers, you can see that Yahoo! actually redirects the query to Site Explorer, which then redirects you back to the standard web search results).

I tried a few additional ICANN test IDN domains in other languages and none of them worked. Yahoo! seems to fail completely at handling IDNs.

Given that Yahoo! is likely to use Bing’s search in the future, let’s see how Bing performs next.

Bing

Searching Bing for the Unicode version of the URL does return a page from the site, although it’s at position 8, which is not ideal (when searching for a URL you would usually want the URL to appear at or near the top of the search results). The snippet appears as follows:

Bing Unicode

Only one URL is shown, which isn’t quite as useful as Google’s result, but is still adequate. The title tag, snippet and URL are all correctly shown in Arabic, which is good. The “Translate this page” and “Cached page” links both work, whilst they didn’t on Google.

Bing does have some issues, however. Although Bing has indexed the destination URL (the link goes directly to the destination URL), for some reason Bing only displays the URL of the homepage in the snippet. Additionally, although Bing has highlighted the domain in bold in the snippet, it has not highlighted it within the URL.

Bing does have a number of problems with its handling of this domain. However, they are fairly minor and definitely less important than the issues that Google has with this site.

Searching Bing for the Punycode version of the URL, Bing returns the URL at position 2 instead, which is a bit better:

Bing Punycode

Again, like Google, Bing has picked out the text from the page which matches the query for the snippet but has not highlighted the Arabic equivalent in the snippet. Otherwise, this result is much the same as the Unicode search variant.

Ask Jeeves

I have also looked at Ask Jeeves (known as just “Ask” in the US).

Searching Ask Jeeves for the Unicode version of the URL returns the site at position one. Like Google, it includes a second indented URL at position two. Interestingly, these are the same two URLs that Google returned for this search (it is worth remembering that Ask Jeeves might be using Google’s results at times).

Ask Jeeves Unicode

Ask Jeeves is correctly displaying both the title and the snippet in Arabic, but the URL is written in the Punycode form instead, which is clearly far from ideal.

There is another major issue with Ask Jeeves’ implementation – the second URL goes through a redirect, but the hostname given by the redirect has been encoded in a way which makes Firefox and Internet Explorer fail to load the page (Google Chrome and Opera did successfully load the page from the redirect). Note: This does not always happen – reloading the page sometimes returns the URL without the redirect, and in this case it works correctly.

Searching Ask Jeeves for the Punycode version of the URL results in much the same as we have seen earlier. Again, the snippet includes the text from the page which matches the query. Ask Jeeves includes a small screenshot of the page too:

Ask Jeeves Punycode

Ask Jeeves’ binoculars feature, which displays a small thumbnail screenshot of the site, does appear to work correctly. However, it is possible that there are issues here as well.

Ask Jeeves Binoculars

Although it’s difficult to make out due to the small size of the thumbnail, it appears that the English text renders correctly but the Arabic text (although correctly displayed in an RTL fashion) looks like it might be showing a nonsense placeholder character, in the same way that web browsers which do not render Unicode characters do. That said, it is difficult to determine for sure from the small thumbnail that Ask Jeeves provides.

Conclusion

In conclusion, Google, Bing and Ask Jeeves do support IDNs to varying degrees. If I had to proclaim a winner at the moment, I would say that Bing had a slight lead, but all of these search engines had some issues. Hopefully these issues will be ironed out by the time that IDNs eventually roll out en-masse.

Yahoo! appears to completely fail to support IDNs at present. Once it switches to Bing’s search engine, however, we assume that it will inherit all of Bing’s IDN support as well.

Tags: , , , , , , , , , ,

2 comments Share

How has Google Suggest affected search queries?

Google Suggest and search refinements were introduced at the end of March this year. How have they impacted the shape of search?

Note: This article focuses on the UK search market.

The folks over at Latitude have written up an interesting piece looking at whether or not the introduction of Google Suggest in the UK has resulted in any changes to the volumes of searches for the search terms which are suggested.

The expected effect would be an increase in suggested search terms as they get more exposure, along with a corresponding decrease in more generic short tail searches and possibly a drop in the long tail as well. Additionally, this should also result in fewer searches for mis-spelled queries, as Google Suggest can correct spelling mistakes on the fly.

The changes expected from the introduction of Google Suggest are very much in line with the type of changes introduced by Google Search Refinements – in fact, Google launched these two new features only a week apart. Therefore, it is likely that the changes in overall search patterns reflect a combination of these two changes.

Let’s have a look at some of the findings. Some examples are really compelling, such as the huge increases in searches for [pet insurance comparison]:

Google Insights - pet/travel insurance (latitude)
[credit - image from referenced post]

Many queries recommended by Google Suggest have experienced a rise since April, although these growth rates have generally been much lower than the rate for the search term [pet insurance comparison] shown above.

However, the picture isn’t completely consistent across the board. The example of [car insurance compare] isn’t entirely accurate, given the way that Google Insights for Search matches queries. Most people are searching for variants in the phrased form “compare car insurance” (the queries for which correlate well to the set of searches containing any of these words. However, queries including the phrase “car insurance compare” are much lower than this, and volumes are relatively flat.

Google Insights - car insurance compare

Note: A side effect of this quirk with Google Insights for Search is that it’s not really possible to determine accurate relative search volumes for more generic terms (which we would expect to have declined slightly) as any generic query entered will also match long-tail variants, including those provided by Google Suggest. You can get some mileage with negative keywords in some instances (here is an example, although this isn’t related to the introduction of Google Suggest) but in general this isn’t suitable.

One area in which the introduction of Google Suggest is expected to have an impact in reducing a particular type of query is with mis-spellings – unlike Google’s other technologies for providing search refinements and spelling corrections, where only after a mis-spelled query has been made is the opportunity is given to correct it (potentially resulting in a second, correctly-spelled query), Google Suggest can prevent mis-spellings from ever happening at all. Here’s an example of a common mis-spelling of the UK’s most popular search term in 2008:

Google Insights - face book

It’s very difficult to pin down a particular trend to one specific change made by Google. There are constantly many changes being made to Google’s search results – 359 last year alone. Some of the experimental features which Google runs never see use beyond a small test group, whilst others are seen by so many people that they’re common knowledge in the SEO industry before they are officially announced. The dates that Google provides for product launches are only rough indications of the dates by which the majority of users will have had access to a feature.

It is also important to note that different features can have related effects – for example, Google’s search refinements launched just a week before Google Suggest came to the UK and both would be likely to have a very similar impact (although I would expect that Google Suggest has a greater impact here than search refinements, due to its relative prominence as a feature).

Side note: The spelling suggestions example above is another good example of need for care with attributing a particular impact on search results on a specific change by Google – around the same time that Google made the changes described above, they also made some changes to spelling correction.

Nevertheless, the data does seem to indicate that the introduction of Google Suggest and search refinements has had the expected effect on search patterns. To round up, I’ll quote the summary of my previous article on search refinements:

As with any change in Google there are winners and losers. Searchers will be more likely to use a wider variety of search queries, meaning that the number of potential visitors will be spread out more evenly across multiple queries. As different websites will rank for different terms, this may result in a “spreading out” of visitors across a greater number of different web sites.

Sites that ranked well for high traffic terms might potentially see a drop in traffic, but the increase in search precision from these more targeted phrases should hopefully mean that searchers are directed to the pages on your site that are most relevant to what they are looking for. That, I think we can all agree, is good for everyone.

Tags: , , , ,

0 comments Share

Google’s new search refinements – how will it affect you?

Google has announced two changes to how it displays its search results pages. These changes have been rolled out across 37 languages worldwide. In this post, we explore what the changes to their “search refinements” means to webmasters.

Google has recently made changes to their “search refinements” feature. These are the links that Google includes at the bottom (and sometimes at the top) of their search results pages which provide a number of suggestions for a searcher to narrow their search down.

Google has introduced new technology that can “better understand associations and concepts related to your search”. Aside from leading to changes in the search refinements which they display, they have also increased the number of these that they may show.

For example, when searching for [principles of physics], you will see many such suggestions now:

Google's new search refinements

Image courtesy of Google.

How might this affect you?

Firstly, as the algorithm powering these search refinements has changed, this means that many of the suggested searches will also have changed. If your site used to get traffic from people clicking on these, it may no longer do so as this particular suggestion may no longer be listed. Conversely, your site may suddenly start to receive traffic from new search refinements.

Secondly, as the number of search refinements has increased, this means that the chance of clicking on any particular one will have decreased.

Thirdly, a combination of an increased number of search refinements and (if we believe Google) an increase in the relevance of these suggestions will likely lead to an increase in the number of searches who use them. This is obviously good for sites which rank well for the suggested searches of high-traffic queries, particularly when these refinements are listed at the top of the search page rather than the bottom. It will also obviously lead to an increase in longer-tail searches.

However, this is not all good news – for every site gaining visitors, another site has to lose them. It is likely that the number of searchers clicking through to page 2 and beyond will decrease as searchers use these links instead, so sites which rank on page 2 for high volume queries (which can still drive a fairly significant amount of traffic for top terms) will likely see a decrease in their traffic for these terms.

The location of the search refinements on the search results page for a particular query will also affect their impact on traffic. Where the search refinements are included at the bottom of a results page it may distract visitors away from site sites ranking just above it, but where Google places them at the top of the results page the impact could potentially be much greater – searchers may click directly onto one of these search suggestions rather than looking through any of the top 10 sites they searched for.

The action that webmasters need to take from this is simple – in addition to the keywords that site owners should already be targeting, they also need to look at the most important search refinements. Look at the most popular searches in your industry niche and look at the search refinements that Google provides. These are keywords that you might want to target next.

After this simple keyword research step all of the usual keyword suggestions apply as per normal (include the most important words in your title tag, try to include the words in the same order, etc). Where Google places the search refinements at the top of the page rather than the bottom, these should be given higher priority as they will likely drive a far larger percentage of traffic than where they are included at the bottom of the page.

How will this change the shape of search?

  • The short tail – Searchers will be encouraged away from the more common shorter-tail queries, so the short-tail will likely shrink to an extent.
  • The middle tail – The number of refinements that will be suggested is limited – therefore, we foresee a fattening of what might be termed the “middle-tail”, that is, queries which are not huge traffic drivers but which are still searched for on a regular basis.
  • The long tail – The true “long tail” of search queries is distinctly different in nature. Remember that Google has said that roughly 20-25% of search queries they have never seen before. The long tail will be affected to an extent, as people may find what they want via query refinements rather than resorting to long queries, but there will always be users who type unusual queries into the search engines.

As with any change in Google there are winners and losers. Searchers will be more likely to use a wider variety of search queries, meaning that the number of potential visitors will be spread out more evenly across multiple queries. As different websites will rank for different terms, this may result in a “spreading out” of visitors across a greater number of different web sites.

Sites that ranked well for high traffic terms might potentially see a drop in traffic, but the increase in search precision from these more targeted phrases should hopefully mean that searchers are directed to the pages on your site that are most relevant to what they are looking for. That, I think we can all agree, is good for everyone.

Tags: , , , ,

1 comments Share

Thinking outside la boîte

One of the most common misconceptions that I deal with is keywords, particularly where travel and related services are involved. If you seek international success it is absolutely vital to think outside the confines of this country.

You have probably done extensive research into the most important destinations relevant to your business and you may even have extended this research beyond the UK to your target countries. So all done, yes?

No! What you need to do now is think yourself into each of your target countries and find out what destinations are most popular there. I know that a sizeable proportion of UK travel related services involves business travellers moving around within this country. It is reasonable, therefore, to assume that in other countries the same applies. While your top destination on the UK site may be London it is surely not reasonable to assume that Monsieur le web searcher in Paris is going to London. He may well be a business traveller on a trip to Toulouse or somewhere else within France. Similarly one of your top French destinations from the UK may be Disneyland Paris, but will the same hold true for native French search?

If you really want international success then you need to think internationally. Part of this thinking is that every country has a unique domestic market as well as a unique way of searching. Acquiring this knowledge is painstaking and sometimes quite difficult but it really is worth the effort. That extra bit of understanding can be the edge that you need in getting your product or service that critical position in the search results. The benefits of good keyword research for PPC campaigns are, I trust, self explanatory.

Tags: ,

0 comments Share