Digging a Smarter Crowd
Digg’s new recommendation system relies on the wisdom of crowds.
Digg, a popular social bookmarking website, began rolling out a recommendation engine late last week. The design of this recommendation engine, however, is quite different from that of the engines used by companies such as Amazon. While e-commerce sites tend to derive recommendations from a mix of information about users’ browsing and purchasing habits and information about the items for sale, Digg’s system, much like the site itself, places its trust in the wisdom of crowds.
Digg has built up a reputation for helping users find interesting stories among the flood of new information that’s constantly posted on the Internet. Users submit interesting items to the website, and then other users “digg” the stories they like and “bury” those they don’t. The most popular stories make their way to Digg’s front page.
Digg has grown considerably since its launch in 2004–which has led to a serious problem for the site and its users. It’s nearly impossible for an interested user to sort through the now more than 15,000 stories that are submitted every day, and it’s therefore difficult for many users to participate in voting on which stories should make the front page. Anton Kast, Digg’s chief scientist, hopes that the recommendation engine will solve this problem. By highlighting the new stories that a user might like, he says, it makes it easier for that user to manage the flow of submitted stories. “You get to see stuff you might be interested in, and you get to contribute in a way that’s more effective than it would have been otherwise,” Kast points out.
But Digg’s character, he says, calls for the design of an unorthodox recommendation engine. “It’s not a magic oracle,” says Kast. “It’s not that we’re saying that the computer is smarter than you, or that we know what you want, or we know who you are.” Instead of using the characteristics of articles to run its recommendation engine’s algorithms, Digg’s system is based entirely on calculating connections between users.
Every time a user digs a story, the system compares that action with the actions of everyone else in the system, and it finds which users have the most digs in common. To keep recommendations from being all over the map, the system calculates connections for each topic separately, so that two users who share an interest in video games won’t necessarily be thought to have, say, like opinions on political stories. To keep recommendations diverse, the system shows only a certain number of stories from each compatible user and, each time the user requests recommendations, fills out its quota of suggestions employing stories selected by less compatible users. The recommendation engine also limits the effect that a single dig can have, so that someone who digs a very popular story won’t suddenly become connected to thousands of other users. Because the system calculates correlations in real time, using separate servers devoted to performing the computations, Kast says that a new dig will affect the recommendation system within one or two minutes.
Paul Lamere, a staff engineer on the Search Inside the Music recommendation project at Sun Microsystems, says that while it can be difficult to build recommendation engines that can handle vast quantities of information and calculations, the nature of Digg makes the problem a bit easier. He says that unlike systems such as Amazon’s, in which the number of items in the database is constantly growing, Digg limits its recommendation engine to items that users selected within the past 30 days, which keeps the data store from getting too large. What’s more, splitting recommendations by topic also turns out to help with scaling, since it reduces the amount of data that needs to be processed at once. Lamere notes, however, that by making recommendations based only on users, rather than on features of the articles themselves, there’s a risk of driving diversity out of the system. “It’s the rich-get-richer phenomenon,” he says, adding that recommendation engines that factor in the characteristics of products or articles can balance popular items by bringing forward lesser-known items with similar qualities.
Although the Digg recommendation engine became available to users, in an experimental version, only a few days ago, Kast says that it’s already having an effect on the way the site functions. “There’s been a huge spike of digging activity on the site,” he notes, “and substantial increases in the number of unique diggers.” Kast says that the company hopes this will ultimately improve the quality of the website. If more users become more active in selecting stories early on, he explains, Digg’s algorithms will have better statistics to work with when promoting stories to the front page.
John Riedl, a professor of computer science at the University of Minnesota who studies recommenders, says that Digg’s entry into the field of recommendations is interesting because news has a very different character from e-commerce. While shopping sites are dealing with fads that play out over the course of weeks and months, news sites are dealing with fads that could pass in the space of a few hours. The time pressure, he says, makes it hard to come up with a system that can sort out stories that are both up-to-date and high quality. Riedl says that he sees Digg’s move as part of the next step in changing how information reaches people. “I’d like to see information disseminated because it’s the stuff that’s most interesting to us individually, based on our tastes and our unique qualities as people,” he says. “I don’t know if Digg’s nailed it yet, but I think it’s an incredible opportunity.”