SEO

Search engines still struggling with Internationalized Domain Names (IDNs)

Internationalized Domain Names (IDNs) are now a reality and in use by websites right now. Unfortunately, it seems that the search engines are still playing catch-up.

Skip to start of post

Introduction

Note: If you are unable to view the Arabic and Cyrillic letters in this page you may need to install the required fonts.

Now that the first Internationalized Domain Names (IDNs) have gone live and have had some time to get established, it seems like a good time to revisit the finding of my previous article on IDNs “Can search engines handle Internationalized Domain Names (IDNs)?

IDNs went live initially for three countries, all using the Arabic alphabet: Egypt (مصر); Saudi Arabia (السعودية); and the United Arab Emirates (امارات). Russia’s new IDN (рф) went live a little later, adding the Cyrillic alphabet to the mix, and additional IDNs have been created for other countries and alphabets. For this article I’ll take a look at how search engines handle these four IDNs.

To get an idea of how extensively the search engines have indexed sites on these new IDNs I’m going to use the “site:” operator. Although this operator is primarily used for finding pages on a particular website, e.g. [site:lbi.co.uk] it can also work all the way up to the TLD level, e.g. [site:uk].

Searching Google for [site:مصر], [site: السعودية], [site:امارات] and [site:рф] returns results from the IDNs  for Egypt, Saudi Arabia, the UAE and Russia as expected. Whilst the new IDN for Saudi Arabia only had 14 pages indexed when checked, the other IDNs all feature thousands of results.

Screenshot of a search for [site:рф] in Google:

Google search for site:рф

Trying the same searches in Bing, however, does not return any results:

Bing search for site:рф

It appears that the site: operator does not work with these new IDNs in Bing (searching for other domains, e.g. [site:com], works as expected).

IDNs in search results?

The next area tested is whether the search engines will return these domains in their search results. To test this I picked out some random web pages on the new Egyptian IDN and tried searching for their title tags in both Google and Bing.

Searching both Google and Bing for the title of one web page, [مراكز التميز في البحث والتطوير - وزارة الإتصالات], brought up a number of web pages. The results from Google and Bing both contained a result from an IDN:

Google snippet featuring an IDN:

Google snippet featuring an Arabic IDN

Bing snippet featuring an IDN:

Bing snippet featuring an Arabic IDN

More IDN bugs

Earlier I described how Bing’s site: operator does not yet work with IDNs. However, Google also has a number of IDN woes. Searching for [site:مصر] (the new IDN for Egypt) brings up the site سجل.مصر – however, clicking on the “Show more results from سجل.مصر” link in Google appears to be listing sites on domains other than سجل.مصر. Additionally, the “Show all results” link is percent encoded rather than listing the site name in the Arabic font.

Screenshot of Google IDN bug

In my previous look at how search engines handled IDNs I had found that Google’s links to “Translate this page” and “Cached” were broken for IDNs. Today it appears that Google has fixed the translation links – however, the cache links still do not appear to function.

Conclusion

The situation is much the same as it was back in February. The search engines can index websites which use IDNs – however, all of the major search engines still have bugs with their IDN support.

Given that the number of IDNs is set to grow and the number of websites using IDNs is likely to vastly increase in the near future, it’s vital that the search engines iron out the bugs in their IDN support. After all, if a search engine can’t handle websites from a particular properly, people might decide to switch to a search engine that can.

Tags: , , , , , , , ,

0 comments Add This

Bing now powering Yahoo! results in the US & Canada

Bing Yahoo! Logo
Yahoo! is dead, long live Yahoo!

The “Binghoo” search alliance is finally coming to fruition. After some initial testing Yahoo! and Bing have announced that Yahoo! has completed the Bing transition and its search results are now being powered entirely by Bing.

This initial rollout only covers the US and the English-language version of Yahoo! in Canada, with other countries set to follow. Given the relative maturity of Bing in the UK compared to many other countries we would be surprised if the next rollout didn’t include the UK, although when this will happen is anyone’s guess. Yahoo! has said that the full worldwide rollout may be as late as 2012.

One country that might not be transitioning to Bing-Powered Yahoo! is Japan – the one country in the world where Yahoo! is a market leader. Yahoo! Japan is only partially owned by Yahoo! and has said that it is planning to use Google to power its search results instead of Bing, a move which Microsoft has slammed as anti-competitive.

Tags: , , , , , ,

0 comments Share

Google manually editing ‘Organic’ search results?

Upon the recent launch of our new LBi.com site we were alarmed to notice that Google was sending visitors to the wrong site!

As you can see below, at the time of writing, a search for [lbi.com] in google.co.uk will display a result for the Leo Baeck Institute in New York, a site about the history and culture of German speaking Jewry hosted on the domain ‘lbi.org’. The ‘sitelinks’ underneath the top result also erroneously refer and link to pages on the lbi.org domain:

Google UK lbi.com search

This is badly wrong. As it happens, this is not a major disaster for LBi, but it could be much different for our natural search clients, who could lose significant revenues as a result of this kind of error.

So why did this happen?

There are no configurations or logical connections between the “lbi.com” site and the “lbi.org” site which could have mislead Google, leaving only two options; an error in Google code, or an error in a manually edited result – the latter of which we believe to be the most likely reason.

This is a very rare occurance that gives us an insight into the world of Google, in particular how some results are so well positioned, despite there being no ‘apparent’ reason for them to be performing so well.

We do see this from time to time, although it should be stressed that the overwhelming majority of sites will never see this kind of manual intervention, and usual best practices still apply.

One reason this result may have been singled out is due to Google’s recent focus on branded search. We suspect that brand results are one of the items currently being identified and prioritised by Google for search quality purposes.

Why would Google be manually editing search results in 2010?

Manually editing SERPS is more common than you might think. It happens for numerous reasons, from legal requests for removal of content, to handing out “black hat” SEO penalties, to delivering expected results for high volume navigational queries where, for example, a user is searching for a branded website.

Search engines have a conundrum, in that they need websites to be included in their index to attract searchers. If they remove websites for infringing terms and conditions no matter who they are, search engine users would soon get fed up and find another search engine. Likewise, if a search engine doesn’t surface expected results for a query because the site a user seeks is not optimised well enough to naturally be top of the search engine results, search engines reserve the right to manually edit results.

This introduces the potential for human error, which we believe is the case for the erroneous result demonstrated here.

Digging a little deeper:

The cached copy of this page, shown below as indexed on the 7th of August, clearly shows “lbi.com” in the cache URL, but “lbi.org” in the cache description. This is only the case for the homepage, for the phrase [lbi.com]:

Google Cache of lbi.org

The same error is evidenced with a search for [lbi.com] on the google.com site:

google.com lbi.com search

The same is also true for a “site:” operator search, which should only return pages from the “lbi.com” domain:

Site search for lbi.com

A search for [lbi] shows the expected results, including the correct ‘lbi.com’ homepage, so this is definitely included in the index:

Google.com search for lbi

The Leo Baeck Institute website (lbi.org) has no such error, showing that there is not a plain switch of site home pages:

Site search for lbi.org

We’ve dropped Google a line and will post further updates here when we hear any news back from them…

Update: Once we highlighted this, Google’s own John Mueller provided a response in the comments below, and within 24 hours the result for [lbi.com] has now been changed to display the expected results, with an LBi.com title, snippet and sitelinks appearing at the top of the page. We would like to extend our thanks to Google for ensuring a swift resolution.

Upon the recent launch of our new LBi.com site we were alarmed to notice that Google was sending visitors to the wrong site!

As you can see below, at the time of writing, a search for [lbi.com] in google.co.uk will display a result for the Leo Baeck Institute in New York, a site about the history and culture of German speaking Jewry hosted at the domain ‘lbi.org’:

Tags: , , ,

10 comments Share

Social may be the key to innovation as competition in search heats up

As reported around this time last year Yahoo and Microsoft have signed a $700 million deal which meant that Bing would provide Yahoo’s search results leaving our friends in Sunnyvale to run what will effectively be a content based web portal, one far more popular in the US than here or the rest of Europe. Clearly, this is all part of Microsoft’s offensive against Google, which has also included taking a stake in Facebook, thus leading a conglomerate of brands against Larry Page and Sergey Brin’s search giant. But now the competitive scramble for users in the search space seems to involve almost every trendy brand in digital.

However, regarding the specific Yahoo/Bing deal, things just started to get a little more real. Last week an update was sent to advertisers stating that Yahoo would being serving natural search results from Bing from “August or September onwards”. Moreover Yahoo will integrate its PPC ads to Microsoft’s AdCenter by the beginning of the ‘holiday season’ (that’s Christmas to us limeys) but may delay that until 2011 if it decides that would “improve the overall experience” for both advertisers and users. “If organic search results are an important source of referrals to your website, you’ll want to make sure that you’re prepared for this change,” so the email said. Well sure, 80% of internet journeys start with search and these two new found friends are important to the search market, though Google is still leading by far, more so in the UK than most places.

According to ComScore’s latest figures from last month, Google have 91.7% of the UK search market with Bing and Yahoo on 2.98% and 2.55% respectively, figures largely unchanged from the last quarter. In the US it’s a different ball game with Google on “just” 63.70%, Yahoo on 18.30% and Bing on 12.10%, with slight rises from the last two against Google over the last quarter.

So many hope that this deal will have a positive effect on search in terms of innovation. For a start, Google will have to try harder, especially in the States, something which will have a knock on affect to the rest of the world. The biggest reason for this is that the merger obviously means increased market share to around a third for Bing/Yahoo. Such an enlarged competitor means more advertisers who may have previously only used Google may experiment with AdCenter, meaning that Google will have to try harder to keep users using their brand, something they have managed quite well in the past from free applications such as Maps and Gmail, to paid for models like the mobile operating system Android and even a rumoured hardware rival to Apple’s iPad.

As SEO industry guru Danny Sullivan said last year, “If Microsoft can adopt a passion for innovation and push the envelope, Google will have to respond in kind. The search experience will evolve more rapidly, hopefully kicked out of the revenue obsessed stasis that it’s currently in. Stagnation benefits no one except the analysts and bean counters who insist that quarter over quarter performance is the only metric that matters. We’re way too early in the game to be that cautious and boring.”

In what form might this innovation come? Well, social could be the key to that. For over a year now it has been speculated that Google use more than PageRank to determine the rankings of web pages. Many search analysts believe that inbuilt into the algorithm are signals from offline media and social networks, even those, such as Twitter and Facebook, that have their links set to ‘nofollow’ (so no link equity is passed on). These links would not carry as much weight as a “regular link” but evidence has been recorded of increased natural search ranking even when no links have been involved. Most famous of these is the recent Magners example from eConsultancy.

Personally, I think it’s fair to say that nothing is certain at this stage, so little is with Google’s algorithm, but there is definitely more emphasis being put on social activity, mainly because since October last year Twitter’s main revenue stream has come from sharing data with Google and Bing, a process that began when Tweets started to show up in natural search results as the engines clambered over themselves to show more ‘real time’ information to the user.

Also, as blogged about by my colleague Johnny Gedye, location based social networking site Foursquare are in talks with Google and Microsoft for a similar deal to Twitter’s:

‘Speaking to the Telegraph, [Foursquare co-founder] Crowley said Foursquare was discussing partnerships with “everyone” – which would include search kings Google, Microsoft and Yahoo! – to “enrich” their search engines with trends generated by the location-based data.

“We can anonymise data and use it to show venues which are trending at that moment,” Crowley explained, voicing the example of Twitter, “Twitter helped the world and the search engines know what people are talking about,” he continued. “Foursquare would allow people to search for the types of place people are going to – and where is trending – not what.”’

And this isn’t the only area where location based networks are springing up. Last month Twitter itself launched Twitter Places whereby users are able to tag tweets to specific places (such as venues) and clicking on those location names will bring up recent tweets from those places. Whether this will become part of the data fed to Google and Microsoft remains to be seen but there is certainly a scramble to make location an integral part of the search experience. Facebook is also rumoured to be developing a similar offering, not to mention anything that may be being thrashed out with Gowalla.

No one knows who will come out on top of this but one thing is for sure, search is only going to become a richer channel over the next year and it looks likely that the brands that make best use of the social space will be the ones that benefit the most.

Tags: , , , , , , ,

1 comments Share

Google Caffeine live.

Back in August we blogged about the news, from Google, of an update to its architecture.  Since then there has been much speculation in the industry about whether or not it was already live. Yesterday Google announced the official launch of its “Caffeine” update.
In Google’s own words

“Caffeine provides 50 percent fresher results for web searches than our last index, and it’s the largest collection of web content we’ve offered.”

Google’s head of spam also explained the update at an SMX advanced session captured on video for Search Engine Land. Matt’s key points in summary were:
Caffeine…

  • Instead of crawling millions of documents in one day and then pushing it live hours later – with the caffeine update  Google can crawl documents and immediately put them into the index to be served live seconds later. So the entire index becomes closer to real time.
  • Increases Google’s ability to scale up the capacity of its index (In the official Google blog post it says that Caffeine already uses nearly 100 million gigabytes of storage!)
  • Makes it easier for Google to annotate documents with information.

As this is an update to Google’s infrastructure, it should not affect rankings.

Tags: , , , ,

0 comments Share