spam

Weekly Social Media Update

AOL Buys Huffington Post

Just a week after copies of The AOL Way were leaked, the media giant has purchased the Huffington Post for $315m. While some argue that AOL’s approach is “nightmarish and cynical“, others point out the inevitability of large-scale content farming. Let’s hope that the arrival of Arianna Huffington within AOL will help them up their game in terms of editorial quality.

Kenneth Cole Hash-Spam Fail

Brands really need to stop doing this. Kenneth Cole came under fire last week for inappropriately using #Cairo in a promotional tweet. It’s almost worse than when Habitat got caught spamming the Iran Mousavi hashtag.

#cairo sales spam

#cairo sales spam

The tweet was deleted shortly afterwards, but it was already too late. The Facebook apology generated another 400 negative comments.

Apology

Kenneth Cole Facebook apology

Guilty as charged...

Guilty as charged...

Social Media Week

Happy Social Media Week! LBi will be participating by running a number of events, including Openshops, Social Media Surgeries, and Meet the Bloggers. We’re also hosting the Social Media Week closing party, and offering two summer work placements through our Tweet of the Week competition! Just tweet with #LBiSMW to take part.

Social Media Week

Social Media Week

Tags: , , , , , , , , , , , , , , , , , , , , , , , ,

0 comments Add This

What do quantum disorder and Google have in common?

Could random matrix theory, as used to analyse disorder in quantum systems, be the next thing to challenge Google?

In the April 4th edition of New Scientist, there was an article entitled "Quantum mathematics could boost keyword searches" – although the website article bears a slightly more provocative title: "Could quantum mathematics shake up Google?". It reports on a mathematical technique called random matrix theory, used by one Pedro Carpena in the analysis of disorder in quantum systems, that might just be the next big thing in search.

What it boils down to is this. Critical words to the subject of a text tend to cluster in certain areas within the copy. When a concept is introduced and explored, key words are used frequently, and then drop off in frequency as the text evolves. Conversely, common, yet irrelevant, words (what some people refer to as stop words or sentence glue) tend to be scattered through the text fairly evenly. As a result, analysing the clustering of words gives a better picture than frequency or density analysis.

Now, modern search engines are not using anything as simple as keyword density analysis these days, but could this, as the article’s title rather sensationalist asks, "shake up Google"? The results produced seem a little hit an miss, with both "you" and "I" appearing in the top-five for both The Odyssey and Moby Dick. It does however, seem to generate some interesting results with all the spaces removed from the text, but that is a different discussion.

While Carpena’s method may be good at pulling relevance from a unbiased text, how good is it at pulling actual relevance from a biased text? Compiling a list of relevant words from a text isn’t the hard part, search engines are already pretty good at identifying text that is relevant to a search; the difficulty is pulling relevance from a text that is deliberately misleading. New analysis algorithms will just force people to develop new ways of gaming the system. The real challenge is in the separation of the wheat from the chaff.

To my mind, this is where many journalists fall down; too many ask if the latest clever method of discerning relevance is the next Google killer, but few look at what Google is actually struggling to achieve. Let’s face it, they have text analysis down pat – while it may not be as elegant as some sophisticated quantum analysis technique, Google will return pages with text that is fairly relevant to your search words. What it struggles with though, is matching the meaning of the search with the intent of the content.

We have all done it. We have been looking for customer reviews on our next intended purchase to see if it has been well-received by its current users, only to find that the search results are cluttered with pages selling the product and somewhere on each is an unpopulated review section. Another scenario is the "this mp3 player isn’t an ipod" style ebay listings.

There are plenty of pages out there that mislead or misrepresent, and there is nothing more frustrating than wading through piles of valueless results that promise the Earth. It is advances toward filtering out these from the short-list of relevant pages that will bring the next quantum-leap in search.

Tags: , , ,

0 comments Share

Manual removal of dangerous cults?

At the end of last month a group called Anonymous successfully produced a Google-bomb, targeting the Church of Scientology.

Now the SERPs have changed .

Is this the algorithm working or are we seeing an instance of manual review?

Last month I posted on a Google-bomb; Google searches for [Dangerous Cult] were returning scientology.org at #1, despite the reported algorithmic change, back in January of last year, aimed at preventing link bombs from affecting the Google SERPs.

Now searches for [dangerous cult] or [brainwashing cult], both of which were being returned at #1 in Google SERPs1, are not anywhere to be seen. Is the algorithm working?

Since the Algorithmic change, famous Google-bombs have disappeared, most notably [Miserable failure] returning George Bush. In April 2007, however, the Whitehouse site included the word ‘failure’ in its page content and the US president was back up at #1 or #2 (depending on the data centre) for a short time, suggesting to many that a single word was enough to trigger the bomb to be reactivated. Scientology.org does have a single instance of the word ‘dangerous’ on the home page, so why is the search term [dangerous cult] not returning the church’s site?

image showing the word 'dangerous' on scientology.org homepage

This might suggest a manual change to remove the unwanted spam. A further suggestion of this is that, for a short time, Google were treating the word [scientology] as a synonym with [cult]2, another example of an embarrassing algorithmic issue which has been cleaned up this month.

So, are Google manually tidying up these gaffs? We cannot know, but my personal opinion is that, in the case of the Google-bomb at least, the disappearance is not out of keeping with the behaviour I have come to expect from the algorithm.

When George Bush was being returned for the single word ‘failure’ there were literally hundreds of thousands of links behind that word, as well as it appearing on the page. With Anonymous’ Google-bomb the links were aimed at a variety of phrases within the structure {adjective} cult. Whilst there were a lot of links with the anchor text [dangerous cult], cult was the stronger word here and it does not appear on the page.

Currently we are seeing the link-bomb still working on some engines, but conspicuous by its absence from Google:

Whether this change is algorithmic or manual, it makes sense. xenu.net still ranks at #3 in Google for the search term [scientology], because that site wants to rank. This is despite Legal pressure for Google to de-list xenu.net.

At the end of the day, it is in Google’s interests to remove anything which artificially influences the relevance of its results. The blatent Google-bombing of scientology.org did precisely that, although how automated the removal system is has been placed in further doubt.


1Search Engine Watch discuss the [brainwashing cult] results, whilst Search Engine Land cover the [dangerous cult] search term (including screenshots).

2First reported by Blogscoped, a screenshot is still available at Valleywag.

Tags: ,

0 comments Share

Google finally tackle Bloggers spam comment issues

Blogger has long been plagued by low-tech spam comments.

Given the recent firm stance on unearned links, it is refreshing to see that this is finally being addressed.

Anyone with a Blogspot blog will know that comment spam is rife within Blogger.

Whilst commenters’ names and any links in the comments had the attribute rel="nofollow, any blog which allowed anonymous commenting (which is a huge proportion of the blogs in the Blogger community) could easily be spammed.

Simply posting a comment and electing to use the ‘Nickname’ method of identifying themselves allowed spammers to enter a nickname such as <a href="http://blog.netrank.co.uk/">Search Engine Optimisation</a>, which would then be displayed as a clean, followable, PageRank passing link, thus: Search Engine Optimisation

As of this weekend this has been changed so that the nickname is displayed in full in plain text, for example: <a href="http://blog.netrank.co.uk/">Search Engine Optimisation</a>. Additionally this change has been implemented retroactively, so that existing spam links have also become plain text.

Whilst this is an excellent piece of news for ethical search professionals, I was going to write a nice post about the dual standards of Google today, based largely on this bug, and now I shall have to think of something else to write about.

Tags: , ,

0 comments Share

Is Wikipedia Admitting Defeat?

Not happy with adding the NOFOLLOW attribute to all outbound links, Wikipedia, in its attempt to curb the tide of spam and vandalism, is to trial a system whereby edits require review from “trusted editors” in order to be published.

January of this year saw Jimbo Wales (co-founder of Wikipedia) adding the NOFOLLOW attribute on every outbound link on Wikipedia in response to an SEO contest (Globalwarming awareness 2007) that was filling the site with unwanted external links.

Whilst Wikipedia’s army of editors is large enough to avoid the kind of manning problems that plague other projects, such as The Open Directory Project (or DMOZ), the fact that anyone can edit any Wikipedia page at any time, has resulted in trust issues. It’s impossible to know if what is being read is accurate (as it is most of the time) or the result of a swift piece of graffiti or spam.

The rather drastic action of implementing NOFOLLOW on external links seemed to do little to stem the tide of spam either. It may have put-off the more seasoned and better-read spammers, but the wanton spammers and vandals were barely perturbed at all.

In response to this, last month saw Virgil Griffith running Wikiscanner through the IP addresses of anonymous changes with an eye to ‘outing’ spammers, but this doesn’t appear to go far enough for Wikipedia; the German site is about to trial a system where only edits which have been reviewed by a “trusted editor” will be published.

Whilst this appears, on the surface, to be the answer to the problem, the sheer scale of Wikipedia means that there will be huge swathes of “trusted editors” with a suddenly increased level of power; at least some of these will have their own motives and interests. Additionally, the self-correcting model which was at the heart of Wikipedia from the start will, to some extent, have been lost.

The extent of elitism and stagnancy that this policy generates also remains to be seen, and Wikipedia would do well to look to the OPD for some lessons from history. A group of “chosen ones” who are responsible for maintaining something as large as Wikipedia could well drown in the sea of all human knowledge, seeing the encyclopaedia date very quickly indeed.

As with the OPD, Wikipedia’s chosen few may also be infiltrated by the more serious and subtle spammers who will stifle impartiality in favour of their own agendas.

So it seems that, whichever way it turns, Wikipedia’s success is doomed to lead to a loss of user-trust.

Larry Sanger (co-founder of Wikipedia) believes that the answer lies in unfettered editing but with the removal of anonymity. His new project, Citizendium, asks that all contributors identify themselves and leave short biographies before they can begin editing.

Sanger appears to be under no illusion that this will stop the spamming, but hopes that it will significantly reduce the problem and produce entries with more credence and reliability. Personally, I think this will lead to an entertaining array of creative biographies dreamed up by spammers to ease their way into a fresh source of spam. Maybe it is time to register funniest-citizendum-biographies.com.

Tags: , ,

0 comments Share