search

Bing to launch updated, renamed web crawler “Bing Bot”

Microsoft is to launch its new spider later this year. Here’s what site owners need to know.

Microsoft’s search engine wasn’t always called “Bing” and its web crawler, “msnbot”, hasn’t kept up with the name change. When Microsoft renamed Live Search (formerly MSN Search) Bing, we have to admit to being mildly disappointed that it didn’t take the opportunity to rename its spider “Bing Bot”.

There are many good reasons not to change the name of a spider, especially one as widely used as Microsoft’s search spider. Many software packages look at the name of visiting browsers and spiders (known as the User-Agent) to perform a variety of functions, and it’s possible that problems might occur for a time on less well-configured websites if this were to be changed. For example, Yahoo! maintained the User-Agent “Slurp” for its spider, which it inherited from its acquisition of Inktomi, to “ensure consistency and minimal disruption”.

It appears that Microsoft has decided that the branding “Bing Bot” is too good to miss, however, and has announced that its next generation spider will indeed be renamed when it comes out of beta.

Here’s what site owners need to know:

When is this happening?

This will happen on 1st October 2010.

This is also when Microsoft’s new spider will officially come out of beta.

What will the User-Agent be?

Microsoft’s current User-Agent is:

msnbot/2.0b (+http://search.msn.com/msnbot.htm)

The new Bing Bot User-Agent will be:

Mozilla/5.0 (compatible; bingbot/2.0 +http://www.bing.com/bingbot.htm)

In addition to the “bingbot” branding, there are two other changes to note. Firstly, Microsoft is switching to the “Mozilla/5.0”-style User-Agent. Google made this change more than six years ago because it wanted web servers to treat its spider more like a real web browser. The second, more minor, change is that the “b” (meaning “beta”) in its version number has been dropped.

Any other changes to the spider’s requests?

In addition to the User-Agent change, Microsoft has also change the “From:” HTTP header field, so the old value of:

From: msnbot(at)microsoft.com

will become:

From: bingbot(at)microsoft.com

Will my old robots.txt entries still work?

Thankfully, Microsoft has decided to make its spider respect the User-Agent field which it currently recognises in robots.txt, “msnbot”. However, the way in which it will work from October is somewhat subtle, so deserves a brief explanation.

Whilst existing directives will still work, Microsoft is also going to recognise a “User-Agent:” robots.txt entry of “bingbot”, and it will give precedence to an entry of “bingbot” over an entry of “msnbot” (which, in turn, has precedence over the catch-all User-Agent entry of “*”). This means that, if you add robots.txt rules for “bingbot”, it will ignore all other rules, including those for “msnbot”.

Whilst adding conflicting “msnbot” and “bingbot” entries hopefully isn’t too likely to happen on most sites, in a larger, more complex organisation in which many different people or departments are able to make changes to robots.txt files, I wouldn’t be surprised to see someone accidentally trip up and add a new “bingbot” entry which doesn’t match up with the already existing “msnbot” entry (for example, where a separate “crawl-delay” value for Bing is specified).

Microsoft clearly wants site owners to update their robot.txt files with the new User-Agent, and we’d definitely recommend that you do this – but don’t forget that the new Bing Bot only launches on 1st October – until then, you should still use the old “msnbot” terminology in your robots.txt files.

What should I do now?

Firstly, if you currently have a separate robots.txt entry for msnbot on your site(s), make a note on your calendar on to change it to “bingbot” on October 1st.

Secondly, make sure that your website doesn’t do anything else special for Microsoft’s crawler or for visitors which don’t identify themselves as ‘Mozilla compatible’. This could include tools such as analytics packages or software which performs anti-spam functionality such as request rate-limiting.

Other than that, there shouldn’t be anything to worry about! However, in the (hopefully unlikely) event that you do experience any problems come October, Microsoft has set up an email address (bingbot@microsoft.com) to help to resolve any issues.

Tags: , , , , , , , ,

0 comments Add This

Google Caffeine live.

Back in August we blogged about the news, from Google, of an update to its architecture.  Since then there has been much speculation in the industry about whether or not it was already live. Yesterday Google announced the official launch of its “Caffeine” update.
In Google’s own words

“Caffeine provides 50 percent fresher results for web searches than our last index, and it’s the largest collection of web content we’ve offered.”

Google’s head of spam also explained the update at an SMX advanced session captured on video for Search Engine Land. Matt’s key points in summary were:
Caffeine…

  • Instead of crawling millions of documents in one day and then pushing it live hours later – with the caffeine update  Google can crawl documents and immediately put them into the index to be served live seconds later. So the entire index becomes closer to real time.
  • Increases Google’s ability to scale up the capacity of its index (In the official Google blog post it says that Caffeine already uses nearly 100 million gigabytes of storage!)
  • Makes it easier for Google to annotate documents with information.

As this is an update to Google’s infrastructure, it should not affect rankings.

Tags: , , , ,

0 comments Share

Video distribution

Now that you know how to optimise your videos for search it’s time to distribute them across the web.

Following from last week’s post about video SEO, this week’s post covers how to distribute your videos across the web and track their performance.

Whilst you can host video files on your own site and submit them to video search engines (Google, Yahoo!, Bing, Blinkx etc.), YouTube is the 800 lb gorilla in the video space. The only way in which your video will be found by people searching on the YouTube website is if you upload the video to YouTube itself.

Using YouTube also has additional benefits – YouTube automatically creates Media RSS feeds, which you can use to submit the video to search engines, and it also makes hosting of videos effectively “free”. However, the Media RSS feeds that YouTube provides link back to the YouTube page, not to the page on your site.

Therefore, we recommend hosting your videos on your own site as well as on YouTube. You can then generate your own Media RSS feeds (linking back to your site) and submit these feeds, rather than the YouTube feeds, to the various video search engines. If hosting videos on your own site, it may also be useful to provide the video content in multiple formats – the more formats in which the video is available, the larger the potential audience (although more formats also means additional costs in both time and bandwidth, so there is a definite trade-off involved).

It may be worth uploading the video to other video hosting platforms, such as Dailymotion, MetaCafe or Vimeo, as well as to YouTube.

When uploading your video to video hosting sites like these (as opposed to submitting it to video search engines), we recommend watermarking the video with the brand name to prevent it from being re-used without attribution.

Most video hosting sites allow you to include a URL along with each video – each video that you upload to a third-party site should ideally link back to the page on your site on which the video is hosted.

It is also a good idea to embed the URL of the page on your site where the video file is included within the video itself. Short URLs are generally better, as users will have to manually type them in. URL shortening services which support tracking of users are particular useful here: they can allow you to track users who visit your site after watching one of your videos, and identify which of your videos and which video hosting sites are attracting the most visitors.

Tags: , , , ,

0 comments Share

Google "Mayday" update – the death of long tail traffic?

Will the most recent Google update kill long tail traffic?

At the end of April/start of May, many webmasters noticed a change in traffic from Google to their sites. Many people posting on the Webmaster World forum saw that they had large drops in long tail traffic (traffic from keyword phrases of 3 or more words). On the 3rd of May Search Engine Roundtable posted an article entitled Google MAYDAY Update Hitting Long Tail Ranking? that summarised the discussions.

During the questions and answers section of a panel at Google I/O, Google’s developer event, Vanessa Fox took the opportunity to ask Matt Cutts, head of Google’s Webspam team, what was happening. Matt said that “this is an algorithmic change in Google, looking for higher quality sites to surface for long tail queries. It went through vigorous testing and isn’t going to be rolled back”. Google also told Vanessa that this had been a change to rankings and not a change in crawling or indexing.

Is long tail search, then, dead? In my opinion, not really. The key here is quality. According to Vanessa Fox (and the general buzz around the industry), the update mainly seems to be affecting pages that are deep within the navigation of sites and that don’t have high numbers of inbound links. These also tend to be pages that are not given much attention in terms of content and optimisation. So the answer probably is (as is often the case), that if you want a page to rank, you have to invest some time in content optimisation and promotion.

Tags: , , , , ,

0 comments Share

Video Optimisation

In this post in our series covering frequently asked questions, we are going to look at optimising video for search.

Faster internet connections have meant that video is a viable option everywhere on the web.

The top three search engines all now include video as part of their main results. Including video can benefit a website and its users in a variety of ways.

However, for video to be effective from an SEO point of view, it needs to be correctly optimised for search. Video content, like images, cannot be “seen” by search engine spiders. Therefore, if a video contains information that is important for the ranking of the page, your site needs to be optimised to point the search engines to the video content. This, in turn, will make it easier for users to find your video in the search engines.

Ideally, before you even make the video, you should make a list of the keywords that you want to target. The video meta data should target these keywords and, at the very least, the title, description, keywords, category, duration and a suitable thumbnail should be included.

There are two main types of meta data that can be employed: XML feeds and HTML markup.

For XML feeds, we generally recommend using Media RSS rather than Video Sitemaps, as the Media RSS format is more widely supported.

For HTML markup you can use either the Facebook Share format or the SearchMonkey RDFa  format. The advantage of using HTML markup is that it may result in your site getting enhanced snippets in the search engines. However, we recommend using both an XML format and an HTML format, as different web services will support different formats.

Each video should have its own page, which should be optimised for keywords relating to the video. This would include titles, headings and meta data. We also recommend adding a summary of the video, which includes these keywords. Both the URL of the page embedding the video and the video file itself should be descriptive and should also include the most important keywords. Another useful addition is a video transcript, which is beneficial for both accessibility and SEO. This transcript can be included on the page in which the video is embedded, in addition to (or in place of) the summary.

The length of the video can also be very important. Short videos are generally better received than long videos. If a longer video is necessary, consider breaking it up into multiple smaller clips or episodes. This will work better for some videos than others – you don’t want to break up a feature film, but creating a series of informational videos that each answer a different question well is often better than creating  a long sprawling video about the entire topic. Web users are known not to be very patient when waiting for videos to buffer. In fact, research by TubeMogul showed that 81% of online video viewers clicked away if a clip rebuffered, so shorter segments are more likely to be watched.

The final important issues to consider are how to host and distribute your online video – we will cover these topics in a separate article next week.

Tags: , , , , ,

0 comments Share