supplemental

Death of an index

Now that its supplemental results will no longer be . . . well, supplemental, I suppose . . . where does that leave those pages which were languishing in Google purgatory?

Back in 2003, Google’s index was simply getting too large. The index wars were not to end until Google removed the number of pages indexed from its home page, some two years later, and the resource required to process all the data meant annexing some of the index off.

This auxiliary index was referenced for search terms with fewer results and allowed pages to be indexed which were not meeting the constraints required by the main index.

Google named this the supplemental index and clearly marked those results which were drawn from the secondary index accordingly:

Image showing supplemental results on a hypothetical Google results page

Then, in July 2007, Google stopped labelling supplemental results. Ostensibly this was because the difference between the two indices was narrowing. More pages could be included within the main index, URLs with more parameters were being effectively crawled and Google was promising to work on including both indices in every query. However there was an additional theory at the time that Google wanted webmasters to stop focusing on small details and to put more effort into developing interesting, value-adding content.

There were still ways for a search professionals to determine which index a page was in. Over time some of these search modifiers stopped working and new ones were found, but still the divide narrowed and, on Tuesday 18th December, Google announced that it now searches the whole index for every query.

Whether this means that there is only one index or whether, as I suspect, there remain two separate indices but with all artificial divides removed, is largely irrelevant. What matters is how this will affect end users and webmasters.

Google says that, from a user perspective, this means more relevant documents and a vaster pool of resource. Particular reference is made to non-English queries (international clients have historically suffered from lack of exposure in the main index). For webmasters, Google says that this means that good-quality pages that were less visible in its index are now more likely to come up for queries.

I doubt that the two indices are weighted equally although – with it being increasingly difficult to ascertain which index a page is in as this change is rolled out across datacenters, it is almost impossible to determine how results from one index are performing against the other – but a deeper, more comprehensive search should mean a boost for less visible, good quality pages which target long-tail queries.

Tags: ,

0 comments Add This

XML Sitemaps; the answer to your indexing problems?

XML sitemaps are widely used, they are supported by all the major search engines and they pretty much guarantee that your pages will get crawled. Is this an SEO tool which is too good to be true?

Back in June 2005 Google introduced a new feature called Google sitemaps, describing it as a beta "ecosystem" that may help webmasters with two current challenges: keeping Google informed about all of your new web pages or updates, and increasing the coverage of your web pages in the Google index.

Since Google released sitemaps under a creative commons licence, all the major search engines were soon announcing support for the sitemap feeds which the Google Sitemap Generator produced.

‘This is great!’ I hear you cry, ‘Every one of my pages is going to be indexed!’ and you’d be right. The engines are making no guarantees, but they will index pretty much every page that they know about, assuming that the page has not been excluded from the index by the webmaster. And that’s where the problem lies.

The major search engine algorithms are all heavily weighted towards link analysis. Using an XML sitemap may cause two problems with regards to this:

  1. If a search engine has a list of every page on the site in the form of an XML feed, it becomes almost impossible to tell whether the pages have been found organically or solely through the XML feed. This sort of information is a huge boon when discovering which sections of a site is doing well and which parts badly, and why.
  2. Finding an entire site through an XML sitemap means that any page found which had not previously been discovered organically by the search engine will be seen as a stand alone page with no inbound links. These ‘orphaned’ pages are very unlikely to rank highly enough to convert and this could lead to a large number of pages being placed in supplemental indices.

XML sitemaps are not an SEO tool. If your page isn’t getting indexed organically then it isn’t going to rank well either. Unless you have a niche keyword for which only half a dozen pages are being found, there is no advantage to being indexed if your site is never going to be returned for a search.

My personal experience has shown that it may take longer for sites to move out of supplemental indices than it would have taken to get them into the main index initially but, with supplemental indices being less transparent than they were a year ago, it is hard to verify whether this observed correlation is indicative of a genuine issue or merely co-incidental.

All an XML feed is doing for you is removing your opportunity to assess which pages have organic indexing issues. A good internal linking structure, including an HTML sitemap and developing sensible link building strategies will provide a much better return than an XML sitemap ever will.

Tags: , ,

0 comments Share