Phony Twitter Profiles Aim to Outwit Spammers

Approach could help software learn how to identify fake accounts with less honorable intentions.

Tom Simonitearchive page

July 9, 2010

It’s not unusual to have user profiles on multiple social networks, or even separate accounts on sites like Twitter–one for work and one for play. But Kyumin Lee at Texas A&M University has 60 Twitter accounts, and not because he’s popular.

Lee’s accounts are “honeypots,” designed to attract the attention of the spammers that increasingly use social networks to spread links to malware and phishing Web sites. Software developed by Lee monitors messages sent to the honeypot accounts to learn the tactics used by spammers.

“The concept of a honeypot is well established at the network level,” says Lee. Usually it takes the form of unprotected computers used to monitor spam e-mail or network-based attacks. “We decided to apply it at a higher level to learn about spam in social networks.” Lee is carrying out the project with A&M colleagues James Caverlee and Brian David Eoff, and with Steve Webb at Georgia Tech University. The work is partially supported by a research award from Google.

The honeypot accounts, like this one, automatically post updates drawn from a collection of 120,000 real tweets harvested from Twitter. The team has also deployed honeypots on MySpace, and created software that uses dummy profiles on both networks to learn about spammer tactics. “We have a bot monitor who contacts our profiles, ” says Lee. “It looks at what they put in their messages and also accesses their profile to see their demographic information and past updates.”

So far, Lee says, “our 61 honeypots tempted and collected 30,867 spammers on Twitter.” The data gathered by those bots can also be used to train “classifier” algorithms to identify spammers that haven’t yet contacted a honeypot. A classifier trained using the Twitter honeypots proved capable of correctly identifying spam profiles more than 80 percent of the time. A public Web service is being built from the trained model that will allow people to look up which accounts it considers spam, and submit corrections for any that are misidentified, says Lee.

Spam and phishing attacks delivered over social networks are a growing problem, says Don DeBolt, director of threat research for IT software firm CA Technologies. For example, a phishing scam operating over Twitter recently stole the iTunes accounts of some users. “People immediately trust these applications because it is how they communicate with friends,” DeBolt explains. “Because people are sending much less text than an e-mail, and URL shorteners are often used, it is harder for people to realize a message may not be real.”

DeBolt’s team maintains honeypot profiles of its own, and monitors them manually to look for new spammer tactics. “We have to take great care, though, in curating them as research profiles that don’t impersonate a real person,” he says.

The fact that social network honeypots must be part of a community is a fundamental difference from the conventional approach, says Azer Bestavros, a networking specialist at Boston University who has, in the past, worked on analyzing blog spam. A honeypot computer on a network is typically allocated to “dark” address space so that they would never legitimately be contacted by another machine.

“Other users could consider our honeypot a real person,” Lee acknowledges. “But we do not have friends or contact other people, and on Twitter our profiles posted random messages so a normal user would not think to contact us.”

Some messages and friend requests sent to a social honeypot may be from legitimate users, so information collected from them needs to be treated carefully, says Bestavros. Lee and colleagues are experimenting with varying the output and demographic characteristics of their honeypots to find out what most attracts spammers–for example, varying the dummy user’s age and location, or the frequency of their updates. “Most of the spammers present themselves as college-age females,” says Lee. Data from MySpace honeypots shows that most claim to be located in California, and so far it seems that college-age males are the preferred target.

Lee and colleagues are also interested in trying the approach on the world’s largest social network: Facebook. “It is a more private network, but if we were able to get permission from them it would be interesting to try it there,” he says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.