domains

Search engines still struggling with Internationalized Domain Names (IDNs)

Internationalized Domain Names (IDNs) are now a reality and in use by websites right now. Unfortunately, it seems that the search engines are still playing catch-up.

Skip to start of post

Introduction

Note: If you are unable to view the Arabic and Cyrillic letters in this page you may need to install the required fonts.

Now that the first Internationalized Domain Names (IDNs) have gone live and have had some time to get established, it seems like a good time to revisit the finding of my previous article on IDNs “Can search engines handle Internationalized Domain Names (IDNs)?

IDNs went live initially for three countries, all using the Arabic alphabet: Egypt (مصر); Saudi Arabia (السعودية); and the United Arab Emirates (امارات). Russia’s new IDN (рф) went live a little later, adding the Cyrillic alphabet to the mix, and additional IDNs have been created for other countries and alphabets. For this article I’ll take a look at how search engines handle these four IDNs.

To get an idea of how extensively the search engines have indexed sites on these new IDNs I’m going to use the “site:” operator. Although this operator is primarily used for finding pages on a particular website, e.g. [site:lbi.co.uk] it can also work all the way up to the TLD level, e.g. [site:uk].

Searching Google for [site:مصر], [site: السعودية], [site:امارات] and [site:рф] returns results from the IDNs  for Egypt, Saudi Arabia, the UAE and Russia as expected. Whilst the new IDN for Saudi Arabia only had 14 pages indexed when checked, the other IDNs all feature thousands of results.

Screenshot of a search for [site:рф] in Google:

Google search for site:рф

Trying the same searches in Bing, however, does not return any results:

Bing search for site:рф

It appears that the site: operator does not work with these new IDNs in Bing (searching for other domains, e.g. [site:com], works as expected).

IDNs in search results?

The next area tested is whether the search engines will return these domains in their search results. To test this I picked out some random web pages on the new Egyptian IDN and tried searching for their title tags in both Google and Bing.

Searching both Google and Bing for the title of one web page, [مراكز التميز في البحث والتطوير - وزارة الإتصالات], brought up a number of web pages. The results from Google and Bing both contained a result from an IDN:

Google snippet featuring an IDN:

Google snippet featuring an Arabic IDN

Bing snippet featuring an IDN:

Bing snippet featuring an Arabic IDN

More IDN bugs

Earlier I described how Bing’s site: operator does not yet work with IDNs. However, Google also has a number of IDN woes. Searching for [site:مصر] (the new IDN for Egypt) brings up the site سجل.مصر – however, clicking on the “Show more results from سجل.مصر” link in Google appears to be listing sites on domains other than سجل.مصر. Additionally, the “Show all results” link is percent encoded rather than listing the site name in the Arabic font.

Screenshot of Google IDN bug

In my previous look at how search engines handled IDNs I had found that Google’s links to “Translate this page” and “Cached” were broken for IDNs. Today it appears that Google has fixed the translation links – however, the cache links still do not appear to function.

Conclusion

The situation is much the same as it was back in February. The search engines can index websites which use IDNs – however, all of the major search engines still have bugs with their IDN support.

Given that the number of IDNs is set to grow and the number of websites using IDNs is likely to vastly increase in the near future, it’s vital that the search engines iron out the bugs in their IDN support. After all, if a search engine can’t handle websites from a particular properly, people might decide to switch to a search engine that can.

Tags: , , , , , , , ,

0 comments Add This

Will purchasing a branded TLD improve your SEO?

As ICANN continues to move forward with its plans to permit the creation of unlimited new generic Top Level Domains (gTLDs), what does this mean for SEO? What are the benefits and downsides of branded gTLDs, and are they commercially viable?

ICANN, the Internet Corporation for Assigned Names and Numbers, which is a not-for-profit organisation tasked with maintaining the Internet registry of domain names and IP addresses, is permitting the registration of new generic Top Level Domains (gTLDs) in addition to existing gTLDs such as ‘.com’, ‘.org’ and ‘.mobi’. In addition to this, ICANN is also currently in the process of expanding the domain name system to include Internationalised Domain Names for non-Latin languages.

ICANN has stated that any entity meeting the following basic registration requirements can apply for the creation of a new gTLD:

  1. String reviews (concerning the applied-for gTLD string). String reviews include a determination that the applied-for gTLD string is not likely to cause security or stability problems in the DNS, including problems caused by similarity to existing TLDs or reserved names.
  2. Applicant reviews (concerning the entity applying for the gTLD and its proposed registry services). Applicant reviews include a determination of whether or not the applicant has the requisite technical, operational, and financial capability to operate a registry.

Applicants will also need to pass checks to ensure that there are no objections to registration, such as cases of public decency (as in the case of .xxx), or legal or commercial objections, such as trademarks. Applicants will also be required to pay ICANN a setup fee of $185,000 (£120,000), a move which has caused some parties to express concerns that gTLDs are simply a cynical money-grabbing exercise.

So what are the benefits of these new gTLDs?

TLDs are all about differentiation. In the same way that a ‘.com’ domain can be thought of as a global site, and a ‘.co.uk’ domain can be considered to be a UK site, these new gTLDs demark specific portions of the Internet, and can be used for various purposes. For example:

  • Sites relating to a particular geographical area, such as ‘.nyc’ for New York City, or ‘.SW1’ for the applicable South-West London postal area.
  • Special interest groups without geographical boundaries can have their own space on the Internet, making them easier to identify and associate with.
  • Companies can cement their brands on the Internet, creating second tier domains for sections of their organisations on a branded gTLDs.
  • Innovative service offerings involving profiles on domains such as ‘.facebook’.
  • Inexpensive, price differentiated domain hosting, on domains such as ‘.mysite’.
  • Sub-sections of the Internet dedicated to specific content, such as ‘.music’, where commercially available tracks can be advertised, or ‘.appstore’ where you can host your iPhone apps.

Recently, Canon announced its application to register ‘.canon’ as a gTLD. What stood out on Canon’s Press Release was its reason for registering the domain:

With the adoption of the new gTLD system, which enables the direct utilization of the Canon brand, Canon hopes to globally integrate open communication policies that are intuitive and easier to remember compared with existing domain names such as "canon.com."

What strikes me about this is that ‘canon.com’ is already pretty easy to remember. I imagine that rather than ‘canon.canon’, subdomains such as ‘corporate.canon’, ‘sales.canon’ and ‘printers.canon’ would be the objective here, although there would at first appear to be little benefit to this.

As Canon is a multi-billion pound company, I suspect that this may have been purchased as a knee-jerk, defensive measure, even if a smart employee did not come up with a good reason as to why it should be purchased.

As it stands, there are likely to be few organisations that can justify the expenditure associated with registering a gTLD, especially given the superficial benefits of the domain name (aside from the potential uses identified in this article). Many companies are already worried about the cost of registering their domains on different TLDs, but do so defensively in case someone else registers them. With a potentially infinite number of new gTLDs coming on the market, these costs are only going to increase.

What do new gTLDs mean for search engines?

As is the case today (with some exceptions), relocating a site from one gTLD to another is unlikely to have many benefits SEO-wise, and often has a negative impact on rankings in the short term. It is also impossible at this stage to tell how the search engines will treat these new gTLDs.

For commercial product sites, moving from an existing domain which is performing well to a new domain is inadvisable due to the link profile and reputation gained over time, not all of which may be transferred to the new domain, at least not right away.

From a monetisation perspective, having the keyword in the TLD will provide far less benefit than having a well optimised site with a strong natural link structure, something which cannot simply be bought off the shelf. The value of reselling domains which are relevant to a particular group of commercial organisations by offering a unique differentiator as per points made earlier in this article is, of course, one method of monetisation, but it would certainly be a brave business model.

In summary, registering a gTLD is only really feasible in a situation where you can justify the expense through clear strategic planning and/or your business cannot afford to have its brand diluted. This will be more important where a company does not own a registered trademark, as an objection is less likely to be upheld should anyone try to create a gTLD of their brand name.

It will be very interesting to see how Canon will use its new asset, assuming that its application is successful. Any other organisation which decides to register a new gTLD will also make an interesting case study for more widespread uptake of this novel branding opportunity.

Tags: , , , , ,

0 comments Share

Can search engines handle Internationalized Domain Names (IDNs)?

Internationalized Domain Names (IDNs) have been approved by ICANN and are set to become a reality. Are the search engines prepared for them?

Skip to start of post

Introduction

Note: If you are unable to view the Chinese and Arabic letters in this page you may need to install the required fonts.

In October 2009, ICANN voted to allow the use of non-ASCII characters in domain names. Non-ASCII characters have existed within domain names for a while – for example, many Hong Kong sites feature Chinese characters (example: http://香港儒釋道院.組織.hk). However, before now, these characters were not allowed within TLDs and, as such, URLs still required ASCII characters (in the example above, the ccTLD “.hk”).

ICANN launched the IDN ccTLD Fast Track Process in November, and last month announced that four top-level IDNs had successfully passed the initial stage of approval (three Arabic-language IDN ccTLDs for Egypt, Saudi Arabia and the United Arab Emirates, and one Cyrillic-language IDN ccTLD for the Russian Federation). At the time of writing, there are another 13 IDN ccTLDs on their way through this process, representing 10 different languages in total.

In order to provide the Internet community time to prepare for the rollout of new IDN domains, ICANN has set up a number of IDN domains for testing purposes. Each of these test domains is written as “example.test” in it’s respective language, and content has been made available to view on each site.

Seeing as most of the initial IDN ccTLDs are likely to be in Arabic, I have used ICANN’s test Arabic domain (مثال.إختبار) for my research.

Before I start, I need to quickly explain what Punycode is, as it it used to support the addition of IDN domains to the existing Internet infrastructure. The problem with the current system is that the Domain Name System (DNS) only allows certain ASCII characters, which means that it is not possible to simply add Unicode characters to it. Punycode was invented to get around this issue. Essentially, it is a method by which Unicode characters can be translated to (and from) the ASCII characters allowed within the DNS. When your browser requests a domain name containing Unicode characters, it converts it to the ASCII-formatted Punycode before sending the request.

For this experiment, I have looked at the way in which the search engines handle both the Unicode form of the Arabic domain (http://مثال.إختبار/) as well as the corresponding Punycode format (which, in this case, is http://xn--mgbh0fb.xn--kgbechtv/). Note that, because Arabic is an RTL (right-to-left) language, pages on this site will have the URL path to the left of the hostname, rather than to the right.

One last note before we look at the results – the test page does not feature a meta description tag, so any snippet text is likely to come from text within the page itself.

Here are the results.

Google

Searching Google for the Unicode variant of the URL returns the homepage of the domain as the first result, with an additional nested result for a second, internal page on the domain:

  Google Unicode

Initially, everything seems to be in place here. The title tags, snippets and URLs are correctly displayed in Arabic, and Google has highlighted the search text in bold as usual. Additionally, the “Similar” pages link works, and the “jump to” successfully takes you to an anchor within the page. Lastly, the URL path is written in the correct RTL form for the second result.

However, not all is well. The first URL that Google is listing, the homepage, is actually a 301 redirect to an internal page. Google should be indexing the destination page, not the redirecting homepage.

There are several other issues too. Firstly, the cached copy link did not work:

Google cached copy

I tried a number of pages on the site and Google’s cached copy did not work for any of them, so Google may have an issue with this feature at present.

Additionally, the “Translate this page” links for both results do not correctly function, and an error message is shown:

Google Translate error

Side note – the “See original page” link does correctly point to the Arabic domain name.

Next I tried searching Google for the Punycode form of the URL:

Google Punycode

Google has returned the same two URLs, which is a good sign of consistency. The title tags are the same, and the URL is still written in Arabic and not displayed in the Punycode form.

This time around, Google has picked out some text on the page which matches the Punycode search term. Although this particular snippet is rather less attractive than the ones from the previous query, matching the exact text on a page is probably the best approach. However, it would also make sense for Google to at least highlight the Unicode version (for example, in the URL), which it currently does not do.

Again, while the “Similar” pages link works, the “Cached” and “Translate this page” links are broken. This seems to be an issue that Google needs to fix.

Yahoo!

Searching Yahoo! for the Unicode or Punycode version of the URL does not return any results from the domain:

Yahoo! Arabic domain fail

Similarly, entering the URLs within Yahoo! Site Explorer simply redirects back to the main Yahoo!
search results. Performing “site:” searches (for either variant) also fails (looking at the HTTP headers, you can see that Yahoo! actually redirects the query to Site Explorer, which then redirects you back to the standard web search results).

I tried a few additional ICANN test IDN domains in other languages and none of them worked. Yahoo! seems to fail completely at handling IDNs.

Given that Yahoo! is likely to use Bing’s search in the future, let’s see how Bing performs next.

Bing

Searching Bing for the Unicode version of the URL does return a page from the site, although it’s at position 8, which is not ideal (when searching for a URL you would usually want the URL to appear at or near the top of the search results). The snippet appears as follows:

Bing Unicode

Only one URL is shown, which isn’t quite as useful as Google’s result, but is still adequate. The title tag, snippet and URL are all correctly shown in Arabic, which is good. The “Translate this page” and “Cached page” links both work, whilst they didn’t on Google.

Bing does have some issues, however. Although Bing has indexed the destination URL (the link goes directly to the destination URL), for some reason Bing only displays the URL of the homepage in the snippet. Additionally, although Bing has highlighted the domain in bold in the snippet, it has not highlighted it within the URL.

Bing does have a number of problems with its handling of this domain. However, they are fairly minor and definitely less important than the issues that Google has with this site.

Searching Bing for the Punycode version of the URL, Bing returns the URL at position 2 instead, which is a bit better:

Bing Punycode

Again, like Google, Bing has picked out the text from the page which matches the query for the snippet but has not highlighted the Arabic equivalent in the snippet. Otherwise, this result is much the same as the Unicode search variant.

Ask Jeeves

I have also looked at Ask Jeeves (known as just “Ask” in the US).

Searching Ask Jeeves for the Unicode version of the URL returns the site at position one. Like Google, it includes a second indented URL at position two. Interestingly, these are the same two URLs that Google returned for this search (it is worth remembering that Ask Jeeves might be using Google’s results at times).

Ask Jeeves Unicode

Ask Jeeves is correctly displaying both the title and the snippet in Arabic, but the URL is written in the Punycode form instead, which is clearly far from ideal.

There is another major issue with Ask Jeeves’ implementation – the second URL goes through a redirect, but the hostname given by the redirect has been encoded in a way which makes Firefox and Internet Explorer fail to load the page (Google Chrome and Opera did successfully load the page from the redirect). Note: This does not always happen – reloading the page sometimes returns the URL without the redirect, and in this case it works correctly.

Searching Ask Jeeves for the Punycode version of the URL results in much the same as we have seen earlier. Again, the snippet includes the text from the page which matches the query. Ask Jeeves includes a small screenshot of the page too:

Ask Jeeves Punycode

Ask Jeeves’ binoculars feature, which displays a small thumbnail screenshot of the site, does appear to work correctly. However, it is possible that there are issues here as well.

Ask Jeeves Binoculars

Although it’s difficult to make out due to the small size of the thumbnail, it appears that the English text renders correctly but the Arabic text (although correctly displayed in an RTL fashion) looks like it might be showing a nonsense placeholder character, in the same way that web browsers which do not render Unicode characters do. That said, it is difficult to determine for sure from the small thumbnail that Ask Jeeves provides.

Conclusion

In conclusion, Google, Bing and Ask Jeeves do support IDNs to varying degrees. If I had to proclaim a winner at the moment, I would say that Bing had a slight lead, but all of these search engines had some issues. Hopefully these issues will be ironed out by the time that IDNs eventually roll out en-masse.

Yahoo! appears to completely fail to support IDNs at present. Once it switches to Bing’s search engine, however, we assume that it will inherit all of Bing’s IDN support as well.

Tags: , , , , , , , , , ,

2 comments Share