research

SEO breadcrumbs for site hierarchies in Google

This latest LBi research explores Google site hierarchies and SEO best practice for implementing breadcrumbs in order for them to be picked up by Google.

Since November, when Google announced the arrival of ‘site hierarchies‘, we have been considering the best method of adding breadcrumbs to your site in order to be picked up by Google and added to its search results pages. The official advice provided in this Google Webmaster video for implementing breadcrumbs is somewhat vague, simply being to use a

“Set of delimited links on your site, that accurately reflect your site’s hierarchy.”

So we decided to investigate exactly what this means in terms of best practice for SEO, in order to provide the best chance of having your breadcrumbs picked up and used within Google search results. It is worth noting that Google indicates that this has not yet been rolled out for all sites:

“By analyzing site breadcrumbs, we’ve been able to improve the search snippet for a small percentage of search results, and we hope to expand in the future.”

Of course, whether having these site links is beneficial to any particular site needs to be addressed on a case-by-case basis. It will depend on whether a particular site is well suited to the use of breadcrumbs, and how your site uses top-level category pages within your site hierarchy.

A few notes on general best practice for breadcrumbs

Breadcrumbs are a secondary navigational feature used in combination with main navigation elements on large hierarchical sites. They provide a reference as to where a user is within a site’s hierarchy and help the user to quickly navigate to a higher level. This also provides the associated benefits of improved usability and reduced bounce rates.

Google’s definition of ‘delimited links that match your site’s hierarchy’ is a good description of breadcrumbs. The ‘delimiting’ character is usually a ‘greater than’ symbol, or ‘>’, which contributes to usability through recognition of uniform navigational elements. There should be no more than a few levels, and the link should start on the homepage and end on the current page. The last ‘breadcrumb’ should not be a link, as it would simply link to the current page and having a description of the current page written with no link provides additional navigational context.

The research

In order to investigate the use of breadcrumbs, we found existing examples of instances in which Google either has or has not picked up a set of delimited links and used them in its results pages. We selected 15 three word queries including keywords with navigational, informational and commercial intent, and analysed the top ten results in Google.co.uk, a total of 150 sites. We initially selected a larger sample, but retrieved enough data to make assertions sooner than expected.

Where to place breadcrumbs on the page

Breadcrumbs are usually seen at the top of the page just under the main navigation. In terms of the Document Object Model (DOM), you can expect to see breadcrumbs some way into the code. They are a secondary navigational element, included after the primary navigation and contained within the hierarchical environment of the DOM. With the addition of CSS, which can place elements anywhere on the page regardless of where they reside in the code, it is unsurprising that we found examples of breadcrumbs lower down the page, and even at the bottom of the page which triggered Google site hierarchies in search results.

However, we recommend placing breadcrumbs at the top of the page for best usability, as this is where most people will search for and recognise them.

Where to place the ‘delimiting’ characters?

Logic prevails here: the ‘delimiting’ characters should be placed in between the links. We saw a couple of sites which had either preceding or trailing delimiting characters, and these did not get picked up by Google.

Which ‘delimiting character’ should be used?

We wouldn’t advise against using any particular characters until further testing has been undertaken, although we found site hierarchy links in the search results for sites using ‘>’, ‘> >’, ‘›’ and ‘»’ characters, with white space and inline elements appearing to be inconsequential (see this example of bold tags on the home link).

The golden rule here is to make whatever is between the breadcrumb links identical, which was the case for 100% of sites which had breadcrumb links in Google results pages. Where this was not the case, breadcrumb links were not attributed to the page.

As a side note, we also found examples which did not use any characters at all, but used block level elements instead, such as the London School of Economics Website, which has a hierarchical structure and no breadcrumb links. Links in the main left hand navigation are separated via individual ‘<div>’s, which were picked up by Google and included in the results pages. This could of course be a false-positive, given that the ‘>’ character is included in the code between the links, but is unlikely.

Other tags which were found in breadcrumbs which were included by Google in the results pages included; ‘<li>’, ‘<div>’, ‘<td>’ and ‘<p>’. In each case, as with the LSE website, they were added between links in a uniform manner.

In summary, the most common method of indicating site hierarchy to Google is by separating links with a symbol such as ‘>’, ‘> >’, ‘›’ or ‘»’, but other methods involving block level elements may be used. This will need to be tested before we can recommend exactly which block level elements to use, and may prove useful where more advanced or decorative breadcrumbs are desired.

Can images be used to separate breadcrumbs?

From the sample we looked at, no images were found in the set of sites which did have Google site hierarchy links, but images were found in the set which did not, even though they had breadcrumb links. The same was the case where CSS was used to insert images as dividers.

As a best practice, we would advise against the use of images as breadcrumb link delimiters. However, we have noted that a site may be able to use images and be picked up by Google if images are contained within block-level elements or combined with relevant characters and appropriate replacement techniques. This is something which we will need to test.

Does a site need to be strictly hierarchical?

The short answer is no. Take this example from South West Water,
where the majority of the pages are numbered using a URL variable. The internal link structure and breadcrumbs are the only clues to the site’s hierarchy. Another example of site hierarchy links showing in Google even though the URL is not hierarchical is Yahoo! Shopping. You will need to follow the link to fully appreciate this. For many reasons, it is best to provide a hierarchical structure, but in a case where this is not so, breadcrumbs make an ideal addition to a site until such time as a redesign structural overhaul is feasible.

Should the first breadcrumb link to the homepage?

We found site hierarchy links in Google search results for pages both with and without breadcrumbs including links to the homepage. From a usability perspective, this is useful, in that where you have a large number of directories, missing the first few may still indicate site hierarchy for search. It is worth noting that the majority of sites do include a link to the homepage in breadcrumbs, and that we would recommend including this as it provides additional context to the hierarchical structure denoted by the breadcrumbs.

Should the last breadcrumb be a link?

We found examples of site hierarchy links in Google from both pages which did and pages which did not use the last breadcrumb to link to the current page.

It provides no benefit to the user if the last link points to the current page, and if the last breadcrumb is simply a description of the page with no link, it can only add to the relevance of the page and context of the breadcrumbs. It says “you are here”, which adds to the usability of the breadcrumbs.

Therefore our recommendation is to have a trailing breadcrumb for the existing page which is not a link. However, if it is, it will not stop you from gaining site hierarchy links in Google.

Can breadcrumbs be cross-domain or cross-subdomain?

Sites which we found with breadcrumbs which contained cross-subdomain links were not included in Google site hierarchies. Bearing in mind that the root domain is listed in the search result, this would render the hierarchy links inaccurate.

Semantically, what code should be used to mark breadcrumbs?

Although HTML lists lend themselves well to writing breadcrumbs, and were used in several examples of sites which were included in Google’s site hierarchy links, breadcrumbs are, by their hierarchical nature, not a list. HTML lists enable the use of customisable bullets to delimit links, but via CSS, which is not written to the page. Only where the list elements were separated by a delimiting character were breadcrumbs included in the search results.

It makes sense to contain the breadcrumbs within a block level element in order to distinguish them in the code, as well as to provide a clear signal to Google as to which content is actually your breadcrumb links. We would recommend using a ‘<div>’ element specifically for this purpose. The breadcrumb links should be placed inside the block level element and should then be separated by the same ‘delimiting’ character (most likely ‘>’). The same number of new lines should be used between each link (no new lines are required, but using one in between each link and each delimiting character will increase code readability).

Should you ‘label’ your breadcrumbs?

Often, breadcrumbs are labelled in the code as IDs, Classes or similar, with names such as ‘breadcrumbs’, ‘crumbs’ or ‘sitenav’. This is not required. We found examples of both labelled and non-labelled breadcrumbs which had been recognised by Google.

That said, there is no reason why you cannot label your breadcrumbs. A label in the code will make it more readable, and it won’t hurt to give an extra clue to search engines as to the content held within the containing element.

Semantically, an ID should be used over a Class, as an ID is a unique identifier to an element on the page, whereas a class can be used multiple times.

How long does it take for breadcrumbs to be found?

A ‘site:’ search in Google, followed by a restriction by date will show that pages are indexed faster than they have the breadcrumb links added. It was noted that this sometimes happens a handful of weeks apart, which suggests that it is a separate process which considers site hierarchy links outside of general indexing procedures.

Best practices for writing breadcrumb code

Based on our findings, an example of the ideal code layout for breadcrumbs is:

<div id="breadcrumbs">

<a href="http://www.cheese.com/">Home</a>

&gt;

<a href="http://www.cheese.com/soft-cheese/">Soft Cheese</a>

&gt;

<strong>Brie</strong>

</div>

A breakdown of the above code elements is as follows:

  • Hierarchical links which match the structure of the site
  • A link to the homepage
  • A reference to the page the site is on rather than a link
  • Contained within a ‘<div>’ element with an ID of ‘breadcrumbs’
  • Uniformly divided links using ‘>’ symbols as ‘delimiting characters’, written as ‘&gt;’
  • Single line break between lines (no line break is required, but adding them makes the code more readable. Remember that if you do add any to keep the number of lines consistent between links.)
  • As per our standard recommendations the inline tag <strong> has also been added to emphasise the keyword in the breadcrumb relating to the current page

A final thought 

Bearing in mind that Google has said it is still experimenting with site hierarchies and that some sites may not be suitable for site hierarchy links (think Wikipedia, or sites with minimal architecture), it does not mean that just because breadcrumb links are implemented a site will receive site hierarchy links in search results (in fact, sites which were setup by LBi to test site hierarchies are still not achieving site hierarchy links).

In addition, the code above is not definitive and, as stated in this article, inline elements such as ‘<b>’ and ‘<span>’ may be added without preventing Google from understanding your breadcrumbs. That said, unless abso
lutely necessary, this is probably best avoided.

For further information on Breadcrumbs, see this article on Smashing Magazine which provides some useful information outside of the scope of this article.

Tags: , , , ,

1 comments Add This

Can search engines handle Internationalized Domain Names (IDNs)?

Internationalized Domain Names (IDNs) have been approved by ICANN and are set to become a reality. Are the search engines prepared for them?

Skip to start of post


Introduction

Note: If you are unable to view the Chinese and Arabic letters in this page you may need to install the required fonts.

In October 2009, ICANN voted to allow the use of non-ASCII characters in domain names. Non-ASCII characters have existed within domain names for a while – for example, many Hong Kong sites feature Chinese characters (example: http://香港儒釋道院.組織.hk). However, before now, these characters were not allowed within TLDs and, as such, URLs still required ASCII characters (in the example above, the ccTLD “.hk”).

ICANN launched the IDN ccTLD Fast Track Process in November, and last month announced that four top-level IDNs had successfully passed the initial stage of approval (three Arabic-language IDN ccTLDs for Egypt, Saudi Arabia and the United Arab Emirates, and one Cyrillic-language IDN ccTLD for the Russian Federation). At the time of writing, there are another 13 IDN ccTLDs on their way through this process, representing 10 different languages in total.

In order to provide the Internet community time to prepare for the rollout of new IDN domains, ICANN has set up a number of IDN domains for testing purposes. Each of these test domains is written as “example.test” in it’s respective language, and content has been made available to view on each site.

Seeing as most of the initial IDN ccTLDs are likely to be in Arabic, I have used ICANN’s test Arabic domain (مثال.إختبار) for my research.

Before I start, I need to quickly explain what Punycode is, as it it used to support the addition of IDN domains to the existing Internet infrastructure. The problem with the current system is that the Domain Name System (DNS) only allows certain ASCII characters, which means that it is not possible to simply add Unicode characters to it. Punycode was invented to get around this issue. Essentially, it is a method by which Unicode characters can be translated to (and from) the ASCII characters allowed within the DNS. When your browser requests a domain name containing Unicode characters, it converts it to the ASCII-formatted Punycode before sending the request.

For this experiment, I have looked at the way in which the search engines handle both the Unicode form of the Arabic domain (http://مثال.إختبار/) as well as the corresponding Punycode format (which, in this case, is http://xn--mgbh0fb.xn--kgbechtv/). Note that, because Arabic is an RTL (right-to-left) language, pages on this site will have the URL path to the left of the hostname, rather than to the right.

One last note before we look at the results – the test page does not feature a meta description tag, so any snippet text is likely to come from text within the page itself.

Here are the results.

Google

Searching Google for the Unicode variant of the URL returns the homepage of the domain as the first result, with an additional nested result for a second, internal page on the domain:

  Google Unicode

Initially, everything seems to be in place here. The title tags, snippets and URLs are correctly displayed in Arabic, and Google has highlighted the search text in bold as usual. Additionally, the “Similar” pages link works, and the “jump to” successfully takes you to an anchor within the page. Lastly, the URL path is written in the correct RTL form for the second result.

However, not all is well. The first URL that Google is listing, the homepage, is actually a 301 redirect to an internal page. Google should be indexing the destination page, not the redirecting homepage.

There are several other issues too. Firstly, the cached copy link did not work:

Google cached copy

I tried a number of pages on the site and Google’s cached copy did not work for any of them, so Google may have an issue with this feature at present.

Additionally, the “Translate this page” links for both results do not correctly function, and an error message is shown:

Google Translate error

Side note – the “See original page” link does correctly point to the Arabic domain name.

Next I tried searching Google for the Punycode form of the URL:

Google Punycode

Google has returned the same two URLs, which is a good sign of consistency. The title tags are the same, and the URL is still written in Arabic and not displayed in the Punycode form.

This time around, Google has picked out some text on the page which matches the Punycode search term. Although this particular snippet is rather less attractive than the ones from the previous query, matching the exact text on a page is probably the best approach. However, it would also make sense for Google to at least highlight the Unicode version (for example, in the URL), which it currently does not do.

Again, while the “Similar” pages link works, the “Cached” and “Translate this page” links are broken. This seems to be an issue that Google needs to fix.

Yahoo!

Searching Yahoo! for the Unicode or Punycode version of the URL does not return any results from the domain:

Yahoo! Arabic domain fail

Similarly, entering the URLs within Yahoo! Site Explorer simply redirects back to the main Yahoo!
search results. Performing “site:” searches (for either variant) also fails (looking at the HTTP headers, you can see that Yahoo! actually redirects the query to Site Explorer, which then redirects you back to the standard web search results).

I tried a few additional ICANN test IDN domains in other languages and none of them worked. Yahoo! seems to fail completely at handling IDNs.

Given that Yahoo! is likely to use Bing’s search in the future, let’s see how Bing performs next.

Bing

Searching Bing for the Unicode version of the URL does return a page from the site, although it’s at position 8, which is not ideal (when searching for a URL you would usually want the URL to appear at or near the top of the search results). The snippet appears as follows:

Bing Unicode

Only one URL is shown, which isn’t quite as useful as Google’s result, but is still adequate. The title tag, snippet and URL are all correctly shown in Arabic, which is good. The “Translate this page” and “Cached page” links both work, whilst they didn’t on Google.

Bing does have some issues, however. Although Bing has indexed the destination URL (the link goes directly to the destination URL), for some reason Bing only displays the URL of the homepage in the snippet. Additionally, although Bing has highlighted the domain in bold in the snippet, it has not highlighted it within the URL.

Bing does have a number of problems with its handling of this domain. However, they are fairly minor and definitely less important than the issues that Google has with this site.

Searching Bing for the Punycode version of the URL, Bing returns the URL at position 2 instead, which is a bit better:

Bing Punycode

Again, like Google, Bing has picked out the text from the page which matches the query for the snippet but has not highlighted the Arabic equivalent in the snippet. Otherwise, this result is much the same as the Unicode search variant.

Ask Jeeves

I have also looked at Ask Jeeves (known as just “Ask” in the US).

Searching Ask Jeeves for the Unicode version of the URL returns the site at position one. Like Google, it includes a second indented URL at position two. Interestingly, these are the same two URLs that Google returned for this search (it is worth remembering that Ask Jeeves might be using Google’s results at times).

Ask Jeeves Unicode

Ask Jeeves is correctly displaying both the title and the snippet in Arabic, but the URL is written in the Punycode form instead, which is clearly far from ideal.

There is another major issue with Ask Jeeves’ implementation – the second URL goes through a redirect, but the hostname given by the redirect has been encoded in a way which makes Firefox and Internet Explorer fail to load the page (Google Chrome and Opera did successfully load the page from the redirect). Note: This does not always happen – reloading the page sometimes returns the URL without the redirect, and in this case it works correctly.

Searching Ask Jeeves for the Punycode version of the URL results in much the same as we have seen earlier. Again, the snippet includes the text from the page which matches the query. Ask Jeeves includes a small screenshot of the page too:

Ask Jeeves Punycode

Ask Jeeves’ binoculars feature, which displays a small thumbnail screenshot of the site, does appear to work correctly. However, it is possible that there are issues here as well.

Ask Jeeves Binoculars

Although it’s difficult to make out due to the small size of the thumbnail, it appears that the English text renders correctly but the Arabic text (although correctly displayed in an RTL fashion) looks like it might be showing a nonsense placeholder character, in the same way that web browsers which do not render Unicode characters do. That said, it is difficult to determine for sure from the small thumbnail that Ask Jeeves provides.

Conclusion

In conclusion, Google, Bing and Ask Jeeves do support IDNs to varying degrees. If I had to proclaim a winner at the moment, I would say that Bing had a slight lead, but all of these search engines had some issues. Hopefully these issues will be ironed out by the time that IDNs eventually roll out en-masse.

Yahoo! appears to completely fail to support IDNs at present. Once it switches to Bing’s search engine, however, we assume that it will inherit all of Bing’s IDN support as well.

Tags: , , , , , , , , ,

2 comments Share

Can PDF, Flash and MS Office documents have PageRank?

The question today is – does Google assign PageRank to non-HTML files such as PDF files, Word documents or Flash files? Here is the definite answer.

Skip to start of post


Introduction

PageRank is just one of the many algorithms that Google uses to rank web pages. However, it is definitely the most well known and, due to the Google Toolbar, one of the most visible.

PageRank originally applied only to web pages, and not other types of files such as Adobe PDF files or Microsoft Office documents. However, Google has indexed these types of files for a long time now, so it would make perfect sense for Google to try and treat these in a similar way to web pages.

A caveat regarding the robots exclusion protocol

As with any test, it is important to ensure that there are no external factors which could affect the results. In this particular case, the Robots Exclusion Protocol is one such factor.

This quote from Matt Cutts sums the issue up nicely:

“a page that is blocked by robots.txt can still accrue PageRank. In the old days, ebay.com blocked Google in robots.txt, but we still wanted to be able to return ebay.com for the query [ebay], so uncrawled urls can accumulate PageRank and be shown in our search results.”

This means that we have to be careful to ensure that any files which we check are not blocked by robots.txt – rather than the non-HTML file itself having PageRank, it could simply be that the URL is blocked by robots.txt. To be sure that Google really does assign PageRank to a particular type of file we have to ensure that it is not blocked by robots.txt.

Note: Although the quote above applies to robots.txt, we have also checked that the files do not have an X-Robots-Tag HTTP header.

What types of files does Google index?

If you go to Google’s Advanced Search page, Google provides options to search for files in a number of formats:

Google Advanced Search supported file types

Google also has a list of supported file types on its file types FAQ page.

Note: We are not going to do an exhaustive list of different file types in this post, but the above list is a good place to start. Also note that we have not looked at images or videos, which have their own Google search verticals.

How we looked for non-HTML files to test

To find non-HTML files which might have PageRank as quickly as possible we used Google’s filetype: operator. We used this operator on its own, rather than combining it with a search query. For example, to search for PDF files we used the query [filetype:pdf].

Note that Google’s filetype: operator isn’t perfect – for example, it will return normal web pages ending with the same extension (for example, here’s a web page with a URL ending with .doc). Therefore, we also have to check each URL to make sure it’s actually the type of file we are looking for.

Results

Note that we are not interested in how high or low the PageRank scores are – what we are looking for here is simply whether they have any PageRank or not.


Adobe Portable Document Format (.pdf)

http://www.deetonline.org/brochure.pdf

PageRank 4 (PageRank 4)

Microsoft Word documents (.doc)

http://www.wvnn.com/privacy_policy.doc

PageRank 4 (PageRank 4)

Flash files (.swf)

http://www.uclalive.org/ucla_live_event_news.swf

PageRank 6 (PageRank 6)

Excel spreadsheets (.xls)

http://www.post.ch/pm_dp_jahresplan.xls

PageRank 3 (PageRank 3)

Plain text files (.txt)

http://www.rarlab.com/themes_new.txt

PageRank 5 (PageRank 5)

We also wanted to check whether Google gives PageRank to file types which aren’t on the list, so we checked a few additional file types:

Microsoft Word 2007 documents (.docx)

http://www.antor.com/EUROPEAN_TRADE_AND_CONSUMER_SHOWS_CALENDAR_2009.docx

PageRank 4 (PageRank 4)

"Comma-separated values" files (.csv)

(a format used for spreadsheets and storing data)

http://www.edeltutiyama.com/hayami2008.csv

PageRank 1 (PageRank 1)

Conclusion

Our research has shown that Google PageRank does not just apply to web pages – it also applies to a range of other documents.

Please note that proving that PageRank applies to the file types examined above only shows that it applies to these particular file types – to be absolutely certain that PageRank applies to a particular file type not listed above, you’d have to check it in the same way.

Tags: , , ,

0 comments Share

How old are Toolbar PageRank values?

Google only updates the PageRank values seen in its toolbar every few months, but calculates new values internally much more frequently.

In this piece of research we try to answer the question “how old are the PageRank values shown when they are published?”, and uncover something surprising in the process.

Please note: The web pages used in this article are used for reference only. LBi does not endorse any of the pages linked to from this article.

Google updates the PageRank values shown in the Google Toolbar every 3-4 months (and sometimes more often). However, Google also calculates the PageRank values that it uses internally much more frequently (at least daily). The PageRank shown in the Google Toolbar is therefore a "snapshot" of values at some point in time.

A commonly asked question when Google updates the PageRank values displayed within its toolbar is "How old are these new PageRank values?" – are they fresh, up-to-date values which have just been calculated, or are they several months old? Although, in general, we would recommend not obsessing about Google’s green bar too much, knowing the answer to this question has several implications – for example, if you know how recent the values are, you can determine whether any recent linkbuilding activity is being accounted for within the new PageRank values.

Methodology

The methodology for this experiment is fairly simple – to know how old the values are, we need to establish what what the length of time was between pages last being given PageRank and the PageRank update. Therefore, we need to find:

  • The most recent page possible which has a PageRank value
  • The earliest possible mention of the recent PageRank update

Oldest mentions of PageRank update

For the purposes of finding the earliest possible date that a PageRank update was mentioned, we have looked at a number of different SEO discussion sites in order to find the earliest mention by a member of their community. We’ll convert all times into British Summer Time (GMT+1) for comparison.

  • Digital Point forums – many posts here, but the earliest is dated "May 28th 2009, 1:01 am". Times are GMT-7, so this is 9:01 BST on May 28
  • High Rankings Forum – there is a post at "7:38pm" – the forum appears to be 6 hours behind BST, so the time of the post is 01:38 BST on May 28
  • SEORoundTable – the first forum post is 06:12 AM on 28th May – as this time is GMT-5, the time is 12:12 BST on May 28
  • WebmasterWorld – the earliest post is "10:12pm UTC" – this is 23:12 BST on May 27

There are lots of other sites, but we’ve picked a selection of the earliest posts. The earliest one seems to be the WebmasterWorld thread, with a time of 23:12 BST on May 27th.

Newest articles with PageRank

The next step requires finding the most recent page possible which has a PageRank value. Please note that this does not mean the most recent page with a PageRank of 1 or more – a PageRank value of "zero" also constitutes a page having a PageRank value assigned to it. A PageRank of zero simply means that, on the sliding scale used by Google, the page falls into the set of pages with the lowest PageRank values. This is different from having no PageRank value at all.

The best place to look for recent pages which may have PageRank is to look for a high-PageRank, high-traffic site which is frequently updated and which uses web feeds to ensure that new pages are rapidly indexed. News sites are ideal for this. We’ve picked The Guardian because the website includes detailed date information, including both the original publication date and the date that the articles were last updated, whereas many other online newspapers don’t include the original article publication dates.

Here are a few of the most recent articles found, along with their dates. These articles are all PageRank zero.

We have not listed articles with no PageRank values at all (to narrow down the interval further) as Google may have simply not crawled these pages yet.

Hang on… what’s this?

Having looked around a number of articles, we suddenly stumbled across this article, which has a PageRank value assigned (zero). The "article history" says:

"This article was first published on guardian.co.uk at 00.01 BST on Thursday 28 May 2009. It appeared in the Guardian on Thursday 28 May 2009 on p35 of the Editorials & reply section. It was last updated at 00.05 BST on Thursday 28 May 2009."

This poses something of a puzzle – here we have an article which has a PageRank score and which was apparently posted 49 minutes after the PageRank update started happening. Thinking caps on! Here are the possible causes of this seemingly paradoxical situation.

Theory 1 – The dates are wrong

This is the simplest explanation. Either the date on the WebmasterWorld thread is wrong, or the date on the rogue Guardian article is wrong.

Theory 2 – Datacenters, datacenters, datacenters

"Datacenters" – the standard fall-back answer to many a Google puzzle. As we know that different datacenters will start showing updated PageRank values at different times, it could be that the datacenter currently serving up the PageRank values that we are seeing is different to the one which first served new results to the poster who started the WebmasterWorld thread listed above.

This theory has interesting implications – given the time gap it would mean that different datacenters calculate PageRank independently of each other.

Theory 3 – Rolling PageRank update

Another possibility is that the PageRank update happens in a number of stages or over a period of time – this would mean that the update had begun when it was first noticed but had not yet been completed by the time that Google found the aforementioned Guardian article.

Conclusion

When Google performs a Toolbar PageRank update it would appear that the values are fresh and up-to-date.

Additionally, there may be an additional mechanism at work which can sometimes result in PageRank values being assigned to some pages shortly after the Toolbar PageRank update has occurred.

Got any comments about this research piece? Let us know in the comments field below!

Tags: , ,

3 comments Share