How Digg Combats Cheaters

Data-visualization tools and community policing help keep Digg’s social news site legitimate and valuable to its readers.

Kate Greenearchive page

January 24, 2007

Digg, the popular aggregation website, is redefining the way that many people find news. Some 850,000 registered users effectively act as an editorial staff, recommending–or “digging”–stories that they deem interesting enough for the site’s home page.

A visual map of digging behavior on Digg. The horizontal axis represents Digg users; the newest are on the far right. The vertical axis represents stories; the newest are at the bottom. Each dot on the map represents a digg, with red dots belonging to a story’s first digg. The horizontal white lines represent digging activity for a popular story. However, the vertical white lines do not describe typical digging behavior and may represent bot activity.

The challenges are keeping undesirable content out and making sure that stories are promoted legitimately. Some people try to “game” the system, using dishonest means to try to increase a story’s chance of getting to the main page. The motivation: money and fame. Articles featured on Digg’s home page typically generate a lot of profitable page views for the source of the story. Gaming attempts takes place in many different ways. Some people create fake user accounts and software called bots, designed to automatically digg stories. Other gamers write fabricated interviews with famous people and post them on suspiciously new blogs in hopes of driving traffic to their website.

According to Digg’s founder, Kevin Rose, the site is designed so that users can monitor digging behavior and self-police. For instance, it’s possible to view the history of users who digg a story: if a story has a large number of diggs from people with newly created user accounts, it’s likely to have been promoted unfairly, potentially from a single user who fabricated the accounts. Members can then use tools to “bury” stories that they don’t think deserve to be on the front page.

Suspicious activity can also be thwarted using the wealth of data on normal digging behavior that Digg has gathered from past use. “With more than two years of experience and statistical and behavioral analysis into the patterns of how legitimate content is submitted and promoted–represented by over 1,200,000 content submissions and 50,000,000 Diggs to date–we have a very detailed understanding of the process,” says Rose.

Finding meaningful patterns in gigabytes of raw data isn’t easy. But certain data-visualization tools can be used to detect suspicious activity easier. “By representing user activity graphically, we can start to see patterns that wouldn’t be normally apparent by other means,” says Eric Rodenbeck, founder of Stamen, the design firm that provides visualization tools for Digg Labs. Stamen developed Digg Labs, which includes visualization tools called Digg Stack and Digg Swarm. These tools show Digg-user behavior in real time to help users find popular stories in different ways.

“Digg Swarm is a good example of how this kind of visualization works,” says Rodenbeck, who isn’t a representative for Digg. “The visualization won’t tell you everything about the activity that you’re observing, but it can illuminate patterns that can give you a better idea of where to look.”

For example, Stamen’s visual map (see image above), designed by technical director Michal Migurski, offers a different perspective on digging behavior. In this image, Digg members are represented on the horizontal axis, with the newest members on the far right, the oldest on the far left. Stories are represented on the vertical axis, with the newest at the bottom, the oldest at the top. Each dot on the map represents a single digg, with red dots belonging to a story’s first digg.

Immediately, some behaviors stand out, explains Rodenbeck. Consider the broken horizontal white lines. These illustrate a story that hit the main page and is acquiring a series of diggs from various readers. However, the broken vertical white lines might represent suspicious digging behavior: they show an individual user digging a large number of stories–both newly submitted and older ones–in rapid succession. It’s improbable that one person produced so many diggs for such a large number of stories, Rodenbeck reasons. It’s far more likely that those diggs were automatically generated by bots, in an effort to artificially promote certain stories, he says.

“It gives us a pretty good picture of what’s going on,” says Rodenbeck, “but it’s only ever a partial picture.” There are many more parameters to map, he says. By mapping the same data using different metrics, such as a particular user’s recent activity or the number of contacts, or “friends,” on Digg that he or she has established, different types of patterns emerge. “We can not only get a more robust understanding of what’s currently happening in the Digg ecosystem, but get a better sense of what kinds of questions to ask moving forward,” Rodenbeck says.

So far the combination of citizen policing and data visualization has worked well to keep gaming on Digg relatively minimal. Although Digg doesn’t keep statistics on the number of gaming attempts since the site went live in late 2004, Rose says that “no organization has been able to successfully game Digg to our knowledge.”

Those users who are suspected of using their account(s) to try to game Digg are sent a warning e-mail. The user is banned after a second violation.

Rodenbeck thinks that cleverly graphing Digg’s social data helps in the fight against cheaters. “Visualization can’t solve the problem of gaming once and for all,” he says. “But it can definitely make the process of discovering patterns simpler, and we think there’s a lot of value in that.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.