A View from Christopher Mims
Why Google is Choked With Spam
High value categories have become dominated by link spam, says CEO of rival search engine Blekko.
There are 100 billion pages on the web, and your search just turned up 5 million of them. Most people don’t go beyond the first five. How does Google decide what appears there? It uses an algorithm, sure, but there are millions of lines of code in that algorithm - and every time Google engineers turn up a search they don’t like, they add a few more.
The reality, argues Rich Skrenta, CEO of week-old search engine upstart Blekko, is that whatever appears in those first five slots is, for all intents and purposes, an editorial decision. And if it’s not, it’s probably spam.
“We’re simply taking a more explicit editorial role,” says Skrenta.
It’s the human-curated element of Blekko that makes it a gateway to a simpler, more reliable web, argues Skrenta. A Web not unlike the one that Google inhabited in its early days, when the links between pages upon which the PageRank algorithm relies were mostly created by humans and not robots.
Blekko distinguishes itself from Google and Bing by excluding from its search listings spam results from the likes of Demand Media and other “content farms”, and made-for-adsense landing pages. It also allows users to add “slash tags” such as “/health” that can be used to narrow search results to just the sites flagged as trustworthy for that category by a wikipedia-like army of curators and spam-policing editors.
“If you do a health-related search on Google, such as ‘cure for headaches,’ go and try 10 random health queries and tally how much spam you see there,” says Skrenta. “You’ll see nonsesnse domains. We looked at this and we said ‘the algorithm [pagerank] is going to sink’–the web is going to a trillion URLs.”
Skrenta decided that the only way to solve this problem was with large-scale human curation.
“If you make a list of the 100 top health sites, they can answer every health query you have,” says Skrenta. “These sites are written by doctors, they have medical librarians on staff–they speak to every medical topic. You don’t really want to search outside of that set.”
Medicine is just the most obvious category where the encroachment of sites stuffed to the gills with SEO tricks, argues Skrenta. Song lyrics sites are another example that, “has just been obliterated by spam.”
For users who aren’t sophisticated enough to use slash tags, Blekko has begun automatically detecting searches in a handful of categories, for instance health, and turning on the slash tags for those searches even when they’re not a part of the original search query.
The idea is that the results yield a safer, better informed, more curated web.
But Blekko is still new, and early tests of the search engine revealed that it doesn’t always perform better than Google at keeping spam out of searches and yielding useful results. However, Skrenta believes that as more and more users volunteer to curate slash tags and become something like the Wikipedians of search, the engine’s results will continue to improve.
In the meantime, a search for headache cures on both Google and Blekko seems to both prove his point and at the same time illustrate why content farms are doing so well: Despite the fact that they don’t come from authoritative sources, the results on Google seem to do a better job of explicitly answering the subject of the search. The results on Blekko come from top-notch sources, but they require more parsing by a user who might otherwise prefer a quick and straightforward answer.
Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.Subscribe today