What do quantum disorder and Google have in common? 

Could random matrix theory, as used to analyse disorder in quantum systems, be the next thing to challenge Google?

In the April 4th edition of New Scientist, there was an article entitled "Quantum mathematics could boost keyword searches" – although the website article bears a slightly more provocative title: "Could quantum mathematics shake up Google?". It reports on a mathematical technique called random matrix theory, used by one Pedro Carpena in the analysis of disorder in quantum systems, that might just be the next big thing in search.

What it boils down to is this. Critical words to the subject of a text tend to cluster in certain areas within the copy. When a concept is introduced and explored, key words are used frequently, and then drop off in frequency as the text evolves. Conversely, common, yet irrelevant, words (what some people refer to as stop words or sentence glue) tend to be scattered through the text fairly evenly. As a result, analysing the clustering of words gives a better picture than frequency or density analysis.

Now, modern search engines are not using anything as simple as keyword density analysis these days, but could this, as the article’s title rather sensationalist asks, "shake up Google"? The results produced seem a little hit an miss, with both "you" and "I" appearing in the top-five for both The Odyssey and Moby Dick. It does however, seem to generate some interesting results with all the spaces removed from the text, but that is a different discussion.

While Carpena’s method may be good at pulling relevance from a unbiased text, how good is it at pulling actual relevance from a biased text? Compiling a list of relevant words from a text isn’t the hard part, search engines are already pretty good at identifying text that is relevant to a search; the difficulty is pulling relevance from a text that is deliberately misleading. New analysis algorithms will just force people to develop new ways of gaming the system. The real challenge is in the separation of the wheat from the chaff.

To my mind, this is where many journalists fall down; too many ask if the latest clever method of discerning relevance is the next Google killer, but few look at what Google is actually struggling to achieve. Let’s face it, they have text analysis down pat – while it may not be as elegant as some sophisticated quantum analysis technique, Google will return pages with text that is fairly relevant to your search words. What it struggles with though, is matching the meaning of the search with the intent of the content.

We have all done it. We have been looking for customer reviews on our next intended purchase to see if it has been well-received by its current users, only to find that the search results are cluttered with pages selling the product and somewhere on each is an unpopulated review section. Another scenario is the "this mp3 player isn’t an ipod" style ebay listings.

There are plenty of pages out there that mislead or misrepresent, and there is nothing more frustrating than wading through piles of valueless results that promise the Earth. It is advances toward filtering out these from the short-list of relevant pages that will bring the next quantum-leap in search.

Tags: , , ,

0 comments Add This

Leave a comment

Please note that job applications should be sent to careers@lbi.com