yahoo!

Bing now powering Yahoo! results in the US & Canada

Bing Yahoo! Logo
Yahoo! is dead, long live Yahoo!

The “Binghoo” search alliance is finally coming to fruition. After some initial testing Yahoo! and Bing have announced that Yahoo! has completed the Bing transition and its search results are now being powered entirely by Bing.

This initial rollout only covers the US and the English-language version of Yahoo! in Canada, with other countries set to follow. Given the relative maturity of Bing in the UK compared to many other countries we would be surprised if the next rollout didn’t include the UK, although when this will happen is anyone’s guess. Yahoo! has said that the full worldwide rollout may be as late as 2012.

One country that might not be transitioning to Bing-Powered Yahoo! is Japan – the one country in the world where Yahoo! is a market leader. Yahoo! Japan is only partially owned by Yahoo! and has said that it is planning to use Google to power its search results instead of Bing, a move which Microsoft has slammed as anti-competitive.

Tags: , , , , , ,

0 comments Add This

Social may be the key to innovation as competition in search heats up

As reported around this time last year Yahoo and Microsoft have signed a $700 million deal which meant that Bing would provide Yahoo’s search results leaving our friends in Sunnyvale to run what will effectively be a content based web portal, one far more popular in the US than here or the rest of Europe. Clearly, this is all part of Microsoft’s offensive against Google, which has also included taking a stake in Facebook, thus leading a conglomerate of brands against Larry Page and Sergey Brin’s search giant. But now the competitive scramble for users in the search space seems to involve almost every trendy brand in digital.

However, regarding the specific Yahoo/Bing deal, things just started to get a little more real. Last week an update was sent to advertisers stating that Yahoo would being serving natural search results from Bing from “August or September onwards”. Moreover Yahoo will integrate its PPC ads to Microsoft’s AdCenter by the beginning of the ‘holiday season’ (that’s Christmas to us limeys) but may delay that until 2011 if it decides that would “improve the overall experience” for both advertisers and users. “If organic search results are an important source of referrals to your website, you’ll want to make sure that you’re prepared for this change,” so the email said. Well sure, 80% of internet journeys start with search and these two new found friends are important to the search market, though Google is still leading by far, more so in the UK than most places.

According to ComScore’s latest figures from last month, Google have 91.7% of the UK search market with Bing and Yahoo on 2.98% and 2.55% respectively, figures largely unchanged from the last quarter. In the US it’s a different ball game with Google on “just” 63.70%, Yahoo on 18.30% and Bing on 12.10%, with slight rises from the last two against Google over the last quarter.

So many hope that this deal will have a positive effect on search in terms of innovation. For a start, Google will have to try harder, especially in the States, something which will have a knock on affect to the rest of the world. The biggest reason for this is that the merger obviously means increased market share to around a third for Bing/Yahoo. Such an enlarged competitor means more advertisers who may have previously only used Google may experiment with AdCenter, meaning that Google will have to try harder to keep users using their brand, something they have managed quite well in the past from free applications such as Maps and Gmail, to paid for models like the mobile operating system Android and even a rumoured hardware rival to Apple’s iPad.

As SEO industry guru Danny Sullivan said last year, “If Microsoft can adopt a passion for innovation and push the envelope, Google will have to respond in kind. The search experience will evolve more rapidly, hopefully kicked out of the revenue obsessed stasis that it’s currently in. Stagnation benefits no one except the analysts and bean counters who insist that quarter over quarter performance is the only metric that matters. We’re way too early in the game to be that cautious and boring.”

In what form might this innovation come? Well, social could be the key to that. For over a year now it has been speculated that Google use more than PageRank to determine the rankings of web pages. Many search analysts believe that inbuilt into the algorithm are signals from offline media and social networks, even those, such as Twitter and Facebook, that have their links set to ‘nofollow’ (so no link equity is passed on). These links would not carry as much weight as a “regular link” but evidence has been recorded of increased natural search ranking even when no links have been involved. Most famous of these is the recent Magners example from eConsultancy.

Personally, I think it’s fair to say that nothing is certain at this stage, so little is with Google’s algorithm, but there is definitely more emphasis being put on social activity, mainly because since October last year Twitter’s main revenue stream has come from sharing data with Google and Bing, a process that began when Tweets started to show up in natural search results as the engines clambered over themselves to show more ‘real time’ information to the user.

Also, as blogged about by my colleague Johnny Gedye, location based social networking site Foursquare are in talks with Google and Microsoft for a similar deal to Twitter’s:

‘Speaking to the Telegraph, [Foursquare co-founder] Crowley said Foursquare was discussing partnerships with “everyone” – which would include search kings Google, Microsoft and Yahoo! – to “enrich” their search engines with trends generated by the location-based data.

“We can anonymise data and use it to show venues which are trending at that moment,” Crowley explained, voicing the example of Twitter, “Twitter helped the world and the search engines know what people are talking about,” he continued. “Foursquare would allow people to search for the types of place people are going to – and where is trending – not what.”’

And this isn’t the only area where location based networks are springing up. Last month Twitter itself launched Twitter Places whereby users are able to tag tweets to specific places (such as venues) and clicking on those location names will bring up recent tweets from those places. Whether this will become part of the data fed to Google and Microsoft remains to be seen but there is certainly a scramble to make location an integral part of the search experience. Facebook is also rumoured to be developing a similar offering, not to mention anything that may be being thrashed out with Gowalla.

No one knows who will come out on top of this but one thing is for sure, search is only going to become a richer channel over the next year and it looks likely that the brands that make best use of the social space will be the ones that benefit the most.

Tags: , , , , , , ,

1 comments Share

Bing to launch updated, renamed web crawler “Bing Bot”

Microsoft is to launch its new spider later this year. Here’s what site owners need to know.

Microsoft’s search engine wasn’t always called “Bing” and its web crawler, “msnbot”, hasn’t kept up with the name change. When Microsoft renamed Live Search (formerly MSN Search) Bing, we have to admit to being mildly disappointed that it didn’t take the opportunity to rename its spider “Bing Bot”.

There are many good reasons not to change the name of a spider, especially one as widely used as Microsoft’s search spider. Many software packages look at the name of visiting browsers and spiders (known as the User-Agent) to perform a variety of functions, and it’s possible that problems might occur for a time on less well-configured websites if this were to be changed. For example, Yahoo! maintained the User-Agent “Slurp” for its spider, which it inherited from its acquisition of Inktomi, to “ensure consistency and minimal disruption”.

It appears that Microsoft has decided that the branding “Bing Bot” is too good to miss, however, and has announced that its next generation spider will indeed be renamed when it comes out of beta.

Here’s what site owners need to know:

When is this happening?

This will happen on 1st October 2010.

This is also when Microsoft’s new spider will officially come out of beta.

What will the User-Agent be?

Microsoft’s current User-Agent is:

msnbot/2.0b (+http://search.msn.com/msnbot.htm)

The new Bing Bot User-Agent will be:

Mozilla/5.0 (compatible; bingbot/2.0 +http://www.bing.com/bingbot.htm)

In addition to the “bingbot” branding, there are two other changes to note. Firstly, Microsoft is switching to the “Mozilla/5.0”-style User-Agent. Google made this change more than six years ago because it wanted web servers to treat its spider more like a real web browser. The second, more minor, change is that the “b” (meaning “beta”) in its version number has been dropped.

Any other changes to the spider’s requests?

In addition to the User-Agent change, Microsoft has also change the “From:” HTTP header field, so the old value of:

From: msnbot(at)microsoft.com

will become:

From: bingbot(at)microsoft.com

Will my old robots.txt entries still work?

Thankfully, Microsoft has decided to make its spider respect the User-Agent field which it currently recognises in robots.txt, “msnbot”. However, the way in which it will work from October is somewhat subtle, so deserves a brief explanation.

Whilst existing directives will still work, Microsoft is also going to recognise a “User-Agent:” robots.txt entry of “bingbot”, and it will give precedence to an entry of “bingbot” over an entry of “msnbot” (which, in turn, has precedence over the catch-all User-Agent entry of “*”). This means that, if you add robots.txt rules for “bingbot”, it will ignore all other rules, including those for “msnbot”.

Whilst adding conflicting “msnbot” and “bingbot” entries hopefully isn’t too likely to happen on most sites, in a larger, more complex organisation in which many different people or departments are able to make changes to robots.txt files, I wouldn’t be surprised to see someone accidentally trip up and add a new “bingbot” entry which doesn’t match up with the already existing “msnbot” entry (for example, where a separate “crawl-delay” value for Bing is specified).

Microsoft clearly wants site owners to update their robot.txt files with the new User-Agent, and we’d definitely recommend that you do this – but don’t forget that the new Bing Bot only launches on 1st October – until then, you should still use the old “msnbot” terminology in your robots.txt files.

What should I do now?

Firstly, if you currently have a separate robots.txt entry for msnbot on your site(s), make a note on your calendar on to change it to “bingbot” on October 1st.

Secondly, make sure that your website doesn’t do anything else special for Microsoft’s crawler or for visitors which don’t identify themselves as ‘Mozilla compatible’. This could include tools such as analytics packages or software which performs anti-spam functionality such as request rate-limiting.

Other than that, there shouldn’t be anything to worry about! However, in the (hopefully unlikely) event that you do experience any problems come October, Microsoft has set up an email address (bingbot@microsoft.com) to help to resolve any issues.

Tags: , , , , , , , ,

0 comments Share

Can search engines handle Internationalized Domain Names (IDNs)?

Internationalized Domain Names (IDNs) have been approved by ICANN and are set to become a reality. Are the search engines prepared for them?

Skip to start of post


Introduction

Note: If you are unable to view the Chinese and Arabic letters in this page you may need to install the required fonts.

In October 2009, ICANN voted to allow the use of non-ASCII characters in domain names. Non-ASCII characters have existed within domain names for a while – for example, many Hong Kong sites feature Chinese characters (example: http://香港儒釋道院.組織.hk). However, before now, these characters were not allowed within TLDs and, as such, URLs still required ASCII characters (in the example above, the ccTLD “.hk”).

ICANN launched the IDN ccTLD Fast Track Process in November, and last month announced that four top-level IDNs had successfully passed the initial stage of approval (three Arabic-language IDN ccTLDs for Egypt, Saudi Arabia and the United Arab Emirates, and one Cyrillic-language IDN ccTLD for the Russian Federation). At the time of writing, there are another 13 IDN ccTLDs on their way through this process, representing 10 different languages in total.

In order to provide the Internet community time to prepare for the rollout of new IDN domains, ICANN has set up a number of IDN domains for testing purposes. Each of these test domains is written as “example.test” in it’s respective language, and content has been made available to view on each site.

Seeing as most of the initial IDN ccTLDs are likely to be in Arabic, I have used ICANN’s test Arabic domain (مثال.إختبار) for my research.

Before I start, I need to quickly explain what Punycode is, as it it used to support the addition of IDN domains to the existing Internet infrastructure. The problem with the current system is that the Domain Name System (DNS) only allows certain ASCII characters, which means that it is not possible to simply add Unicode characters to it. Punycode was invented to get around this issue. Essentially, it is a method by which Unicode characters can be translated to (and from) the ASCII characters allowed within the DNS. When your browser requests a domain name containing Unicode characters, it converts it to the ASCII-formatted Punycode before sending the request.

For this experiment, I have looked at the way in which the search engines handle both the Unicode form of the Arabic domain (http://مثال.إختبار/) as well as the corresponding Punycode format (which, in this case, is http://xn--mgbh0fb.xn--kgbechtv/). Note that, because Arabic is an RTL (right-to-left) language, pages on this site will have the URL path to the left of the hostname, rather than to the right.

One last note before we look at the results – the test page does not feature a meta description tag, so any snippet text is likely to come from text within the page itself.

Here are the results.

Google

Searching Google for the Unicode variant of the URL returns the homepage of the domain as the first result, with an additional nested result for a second, internal page on the domain:

  Google Unicode

Initially, everything seems to be in place here. The title tags, snippets and URLs are correctly displayed in Arabic, and Google has highlighted the search text in bold as usual. Additionally, the “Similar” pages link works, and the “jump to” successfully takes you to an anchor within the page. Lastly, the URL path is written in the correct RTL form for the second result.

However, not all is well. The first URL that Google is listing, the homepage, is actually a 301 redirect to an internal page. Google should be indexing the destination page, not the redirecting homepage.

There are several other issues too. Firstly, the cached copy link did not work:

Google cached copy

I tried a number of pages on the site and Google’s cached copy did not work for any of them, so Google may have an issue with this feature at present.

Additionally, the “Translate this page” links for both results do not correctly function, and an error message is shown:

Google Translate error

Side note – the “See original page” link does correctly point to the Arabic domain name.

Next I tried searching Google for the Punycode form of the URL:

Google Punycode

Google has returned the same two URLs, which is a good sign of consistency. The title tags are the same, and the URL is still written in Arabic and not displayed in the Punycode form.

This time around, Google has picked out some text on the page which matches the Punycode search term. Although this particular snippet is rather less attractive than the ones from the previous query, matching the exact text on a page is probably the best approach. However, it would also make sense for Google to at least highlight the Unicode version (for example, in the URL), which it currently does not do.

Again, while the “Similar” pages link works, the “Cached” and “Translate this page” links are broken. This seems to be an issue that Google needs to fix.

Yahoo!

Searching Yahoo! for the Unicode or Punycode version of the URL does not return any results from the domain:

Yahoo! Arabic domain fail

Similarly, entering the URLs within Yahoo! Site Explorer simply redirects back to the main Yahoo!
search results. Performing “site:” searches (for either variant) also fails (looking at the HTTP headers, you can see that Yahoo! actually redirects the query to Site Explorer, which then redirects you back to the standard web search results).

I tried a few additional ICANN test IDN domains in other languages and none of them worked. Yahoo! seems to fail completely at handling IDNs.

Given that Yahoo! is likely to use Bing’s search in the future, let’s see how Bing performs next.

Bing

Searching Bing for the Unicode version of the URL does return a page from the site, although it’s at position 8, which is not ideal (when searching for a URL you would usually want the URL to appear at or near the top of the search results). The snippet appears as follows:

Bing Unicode

Only one URL is shown, which isn’t quite as useful as Google’s result, but is still adequate. The title tag, snippet and URL are all correctly shown in Arabic, which is good. The “Translate this page” and “Cached page” links both work, whilst they didn’t on Google.

Bing does have some issues, however. Although Bing has indexed the destination URL (the link goes directly to the destination URL), for some reason Bing only displays the URL of the homepage in the snippet. Additionally, although Bing has highlighted the domain in bold in the snippet, it has not highlighted it within the URL.

Bing does have a number of problems with its handling of this domain. However, they are fairly minor and definitely less important than the issues that Google has with this site.

Searching Bing for the Punycode version of the URL, Bing returns the URL at position 2 instead, which is a bit better:

Bing Punycode

Again, like Google, Bing has picked out the text from the page which matches the query for the snippet but has not highlighted the Arabic equivalent in the snippet. Otherwise, this result is much the same as the Unicode search variant.

Ask Jeeves

I have also looked at Ask Jeeves (known as just “Ask” in the US).

Searching Ask Jeeves for the Unicode version of the URL returns the site at position one. Like Google, it includes a second indented URL at position two. Interestingly, these are the same two URLs that Google returned for this search (it is worth remembering that Ask Jeeves might be using Google’s results at times).

Ask Jeeves Unicode

Ask Jeeves is correctly displaying both the title and the snippet in Arabic, but the URL is written in the Punycode form instead, which is clearly far from ideal.

There is another major issue with Ask Jeeves’ implementation – the second URL goes through a redirect, but the hostname given by the redirect has been encoded in a way which makes Firefox and Internet Explorer fail to load the page (Google Chrome and Opera did successfully load the page from the redirect). Note: This does not always happen – reloading the page sometimes returns the URL without the redirect, and in this case it works correctly.

Searching Ask Jeeves for the Punycode version of the URL results in much the same as we have seen earlier. Again, the snippet includes the text from the page which matches the query. Ask Jeeves includes a small screenshot of the page too:

Ask Jeeves Punycode

Ask Jeeves’ binoculars feature, which displays a small thumbnail screenshot of the site, does appear to work correctly. However, it is possible that there are issues here as well.

Ask Jeeves Binoculars

Although it’s difficult to make out due to the small size of the thumbnail, it appears that the English text renders correctly but the Arabic text (although correctly displayed in an RTL fashion) looks like it might be showing a nonsense placeholder character, in the same way that web browsers which do not render Unicode characters do. That said, it is difficult to determine for sure from the small thumbnail that Ask Jeeves provides.

Conclusion

In conclusion, Google, Bing and Ask Jeeves do support IDNs to varying degrees. If I had to proclaim a winner at the moment, I would say that Bing had a slight lead, but all of these search engines had some issues. Hopefully these issues will be ironed out by the time that IDNs eventually roll out en-masse.

Yahoo! appears to completely fail to support IDNs at present. Once it switches to Bing’s search engine, however, we assume that it will inherit all of Bing’s IDN support as well.

Tags: , , , , , , , , ,

2 comments Share

Yahoo! – Microsoft search deal approved and imminent

The Yahoo! – Microsoft deal has now been passed by the European Commission, paving the way for it to be implemented within the “next few days” according to an official press release.

Yesterday, Microsoft and Yahoo! announced the outcome of the European Commission’s ruling on the search deal. Regulators in the US, Australia, Brazil and Canada have all approved the deal, which still requires formal approval in other jurisdictions including Korea, Taiwan and Japan. The Yahoo! press release states that the European ruling will see the deal being implemented within "the next few days", with Yahoo! becoming the

“exclusive relationship sales force for both companies’ premium search advertisers globally”.

The press release goes on to explain the deal as follows:

“Under terms of the agreement, which was announced in late July 2009, Microsoft will provide Yahoo! with the same search result listings available through Bing, and Yahoo! will innovate around those listings by integrating rich Yahoo! content, enhanced listings with conveniently organized information about key topics, and tools to tailor the experience for Yahoo! users.

Yahoo! will focus on providing a compelling and innovative search experience that allows people to find and explore the things, people and sites that matter most to them. While Microsoft will provide the underlying platform, both companies will continue to create different, compelling and evolving experiences, competing for audience, engagement and clicks.”

Steve Ballmer, Microsoft CEO summed the deal up by stating:

“I believe that together, Microsoft and Yahoo! will promote more choice, better value and greater innovation to our customers as well as to advertisers and publishers.”

The deal’s approval was largely expected by those in the industry, with the consensus being that Yahoo! and Microsoft together can compete more effectively with Google in order to prevent it from becoming an entirely dominant force. Further details about the deal can be found at searchalliance.com.

Tags: , , , , , ,

0 comments Share