We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Christopher Mims

A View from Christopher Mims

Why Google is Choked With Spam

High value categories have become dominated by link spam, says CEO of rival search engine Blekko.

  • November 10, 2010

There are 100 billion pages on the web, and your search just turned up 5 million of them. Most people don’t go beyond the first five. How does Google decide what appears there? It uses an algorithm, sure, but there are millions of lines of code in that algorithm - and every time Google engineers turn up a search they don’t like, they add a few more.

Rick Skrenta, CEO of search engine Blekko
Credit: (cc) Robert Scoble

The reality, argues Rich Skrenta, CEO of week-old search engine upstart Blekko, is that whatever appears in those first five slots is, for all intents and purposes, an editorial decision. And if it’s not, it’s probably spam.

“We’re simply taking a more explicit editorial role,” says Skrenta.

It’s the human-curated element of Blekko that makes it a gateway to a simpler, more reliable web, argues Skrenta. A Web not unlike the one that Google inhabited in its early days, when the links between pages upon which the PageRank algorithm relies were mostly created by humans and not robots.

Blekko distinguishes itself from Google and Bing by excluding from its search listings spam results from the likes of Demand Media and other “content farms”, and made-for-adsense landing pages. It also allows users to add “slash tags” such as “/health” that can be used to narrow search results to just the sites flagged as trustworthy for that category by a wikipedia-like army of curators and spam-policing editors.

“If you do a health-related search on Google, such as ‘cure for headaches,’ go and try 10 random health queries and tally how much spam you see there,” says Skrenta. “You’ll see nonsesnse domains. We looked at this and we said ‘the algorithm [pagerank] is going to sink’–the web is going to a trillion URLs.”

Skrenta decided that the only way to solve this problem was with large-scale human curation.

“If you make a list of the 100 top health sites, they can answer every health query you have,” says Skrenta. “These sites are written by doctors, they have medical librarians on staff–they speak to every medical topic. You don’t really want to search outside of that set.”

Medicine is just the most obvious category where the encroachment of sites stuffed to the gills with SEO tricks, argues Skrenta. Song lyrics sites are another example that, “has just been obliterated by spam.”

For users who aren’t sophisticated enough to use slash tags, Blekko has begun automatically detecting searches in a handful of categories, for instance health, and turning on the slash tags for those searches even when they’re not a part of the original search query.

The idea is that the results yield a safer, better informed, more curated web.

But Blekko is still new, and early tests of the search engine revealed that it doesn’t always perform better than Google at keeping spam out of searches and yielding useful results. However, Skrenta believes that as more and more users volunteer to curate slash tags and become something like the Wikipedians of search, the engine’s results will continue to improve.

In the meantime, a search for headache cures on both Google and Blekko seems to both prove his point and at the same time illustrate why content farms are doing so well: Despite the fact that they don’t come from authoritative sources, the results on Google seem to do a better job of explicitly answering the subject of the search. The results on Blekko come from top-notch sources, but they require more parsing by a user who might otherwise prefer a quick and straightforward answer.

Follow Mims on Twitter or contact him via email.

AI is here.
Own what happens next at EmTech Digital 2019.

Register now
Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.