Is Ask Jeeves scraping Google? 

Ask Jeeves has pages in its index which could only have been spidered by Googlebot. What is going on?

I was experimenting with User-Agents the other day and came across UserAgent.org – a site which simply displays your web-browser’s User-Agent string. I thought it might be interesting to look at which User-Agents the various search engines had used when they last spidered the site. Little did I expect to find this!

As expected, Google, Yahoo! and Bing simply displayed their standard User-Agents. For example, here’s the result when searching in Bing:

Bing results for UserAgent.org

Side note: Bing is gradually shifting away from msnbot 1.1 and is moving to msnbot 2.0.

However, something rather unexpected happened when searching for that site in the number four search engine, Ask Jeeves:

Ask Jeeves results for UserAgent.org

Eh? That’s Google’s User-Agent, Googlebot! At first, I wondered if Ask Jeeves was simply pretending to be Googlebot sometimes (perhaps to get around websites which block their spider or to detect cloaking). However, when looking at a page which shows the IP address that the request came from, the mystery deepened further:

Ask Jeeves results for UserAgent.org IP address

This page was fetched from the IP address 66.249.68.19. I immediately recognised this as one of Googlebot’s IP addresses (Google owns the entire IP range 66.249.64.0 to 66.249.95.255, and it’s a common Googlebot crawl source). Sure enough, this IP address resolved to the following domain:

crawl-66-249-68-19.googlebot.com

What does this mean? It means that this page must have been fetched by Google’s spiders, not those from Ask Jeeves. It’s not just this site either, there are many, many pages indexed by Ask Jeeves which were spidered from the same location.

Ask Jeeves results from multiple Googlebot IP addresses

It gets even more peculiar – if you look at the cached copy of UserAgent.org, Ask Jeeves instead displays it as having the Ask Jeeves/Teoma spider, with the following User-Agent:

Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)

Also, sometimes you do indeed get Ask Jeeves results – for example, here’s exactly the same web page we saw earlier, after refreshing the search results page a few times:

Ask Jeeves Teoma spider

The IP address 66.235.124.6 resolves to the following Ask Jeeves crawler hostname:

crawler5006.ask.com

In other words, sometimes Ask Jeeves is displaying a page fetched by Googlebot, and sometimes it is displaying the page fetched by its own spider. Typically, the first time a particular request is made, you get the Googlebot-fetched page, and after that Ask Jeeves usually shows the copy it fetched itself.

So why is Ask Jeeves including Google-sourced pages? Well, aside from the somewhat crazy idea that they might actually be scraping Google’s cached pages, which I think we can dismiss, this means that Ask Jeeves and Google have some kind of agreement whereby Google is assisting its diminutive competitor with spidering – and quite possibly more than that.

According to paidContent.org, the advertising deal between Ask Jeeves and Google includes a provision for Google to assist in providing algorithmic search results to Ask Jeeves, not just the better known advertising aspect of the deal.

If so, this discovery of Google-sourced search results could possibly be the first real proof that Ask Jeeves is throwing in the algorithmic towel and giving up on its own search engine.

Note: This is particularly interesting in light of the recently announced Microsoft-Yahoo! deal.

See our follow-up post: Are we losing two of the top four search engines?

Tags: ,

0 comments Add This

Leave a comment

Please note that job applications should be sent to careers@lbi.com