Skip to Content
Uncategorized

Why Google is Choked With Spam

High value categories have become dominated by link spam, says CEO of rival search engine Blekko.
November 10, 2010

There are 100 billion pages on the web, and your search just turned up 5 million of them. Most people don’t go beyond the first five. How does Google decide what appears there? It uses an algorithm, sure, but there are millions of lines of code in that algorithm - and every time Google engineers turn up a search they don’t like, they add a few more.

Rick Skrenta, CEO of search engine Blekko
Credit: (cc) Robert Scoble

The reality, argues Rich Skrenta, CEO of week-old search engine upstart Blekko, is that whatever appears in those first five slots is, for all intents and purposes, an editorial decision. And if it’s not, it’s probably spam.

“We’re simply taking a more explicit editorial role,” says Skrenta.

It’s the human-curated element of Blekko that makes it a gateway to a simpler, more reliable web, argues Skrenta. A Web not unlike the one that Google inhabited in its early days, when the links between pages upon which the PageRank algorithm relies were mostly created by humans and not robots.

Blekko distinguishes itself from Google and Bing by excluding from its search listings spam results from the likes of Demand Media and other “content farms”, and made-for-adsense landing pages. It also allows users to add “slash tags” such as “/health” that can be used to narrow search results to just the sites flagged as trustworthy for that category by a wikipedia-like army of curators and spam-policing editors.

“If you do a health-related search on Google, such as ‘cure for headaches,’ go and try 10 random health queries and tally how much spam you see there,” says Skrenta. “You’ll see nonsesnse domains. We looked at this and we said ‘the algorithm [pagerank] is going to sink’–the web is going to a trillion URLs.”

Skrenta decided that the only way to solve this problem was with large-scale human curation.

“If you make a list of the 100 top health sites, they can answer every health query you have,” says Skrenta. “These sites are written by doctors, they have medical librarians on staff–they speak to every medical topic. You don’t really want to search outside of that set.”

Medicine is just the most obvious category where the encroachment of sites stuffed to the gills with SEO tricks, argues Skrenta. Song lyrics sites are another example that, “has just been obliterated by spam.”

For users who aren’t sophisticated enough to use slash tags, Blekko has begun automatically detecting searches in a handful of categories, for instance health, and turning on the slash tags for those searches even when they’re not a part of the original search query.

The idea is that the results yield a safer, better informed, more curated web.

But Blekko is still new, and early tests of the search engine revealed that it doesn’t always perform better than Google at keeping spam out of searches and yielding useful results. However, Skrenta believes that as more and more users volunteer to curate slash tags and become something like the Wikipedians of search, the engine’s results will continue to improve.

In the meantime, a search for headache cures on both Google and Blekko seems to both prove his point and at the same time illustrate why content farms are doing so well: Despite the fact that they don’t come from authoritative sources, the results on Google seem to do a better job of explicitly answering the subject of the search. The results on Blekko come from top-notch sources, but they require more parsing by a user who might otherwise prefer a quick and straightforward answer.

Follow Mims on Twitter or contact him via email.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.