The Secret Science of Retweets

There’s a secret to persuading strangers to retweet your messages. And a machine learning algorithm has discovered it.

Emerging Technology from the arXivarchive page

May 20, 2014

If you send a tweet to a stranger asking them to retweet it, you probably wouldn’t be surprised if they ignored you entirely. But if you sent out lots of tweets like this, perhaps a few might end up being passed on.

How come? What makes somebody retweet information from a stranger? That’s the question addressed today by Kyumin Lee from Utah State University in Logan and a few pals from IBM’s Almaden research center in San Jose.

These guys say that by studying the characteristics of Twitter users, it is possible to identify strangers who are more likely to pass on your message than others. And in doing this, the researchers say they’ve been able to improve the retweet rate of messages sent strangers by up to 680 percent.

So how did they do it? The new technique is based on the idea that some people are more likely to tweet than others, particularly on certain topics and at certain times of the day. So the trick is to find these individuals and target them when they are likely to be most effective.

So the approach was straightforward. The idea is to study the individuals on Twitter, looking at their profiles and their past tweeting behavior, looking for clues that they might be more likely to retweet certain types of information. Having found these individuals, send your tweets to them.

That’s the theory. In practice, it’s a little more involved. Lee and co wanted to test people’s response to two types of information: local news (in San Francisco) and tweets about bird flu, a significant issue at the time of their research. They then created several Twitter accounts with a few followers, specifically to broadcast information of this kind.

Next, they selected people to receive their tweets. For the local news broadcasts, they searched for Twitter users geolocated in the Bay area, finding over 34,000 of them and choosing 1,900 at random.

They then a sent a single message to each user of the format:

”@ SFtargetuser “A man was killed and three others were wounded in a shooting … http://bit.ly/KOl2sC” Plz RT this safety news”

So the tweet included the user’s name, a short headline, a link to the story and a request to retweet.

Of these 1,900 people, 52 retweeted the message they received. That’s 2.8 percent.

For the bird flu information, Lee and co hunted for people who had already tweeted about bird flu, finding 13,000 of them and choosing 1,900 at random. Of these, 155 retweeted the message they received, a retweet rate of 8.4 percent.

But Lee and co found a way to significantly improve these retweet rates. They went back to the original lists of Twitter users and collected publicly available information about each of them, such as their personal profile, the number of followers, the people they followed, their 200 most recent tweets and whether they retweeted the message they had received

Next, the team used a machine learning algorithm to search for correlations in this data that might predict whether somebody was more likely to retweet. For example, they looked at whether people with older accounts were more likely to retweet or how the ratio of friends to followers influenced the retweet likelihood, or even how the types of negative or positive words they used in previous tweets showed any link. They also looked at the time of day that people were most active in tweeting.

The result was a machine learning algorithm capable of picking users who were most likely to retweet on a particular topic.

And the results show that it is surprisingly effective. When the team sent local information tweets to individuals identified by the algorithm, 13.3 percent retweeted it, compared to just 2.6 percent of people chosen at random.

And they got even better results when they timed the request to match the periods when people had been most active in the past. In that case, the retweet rate rose to 19.3 percent. That’s an improvement of over 600 percent.

Similarly, the rate for bird flu information rose from 8.3 percent for users chosen at random to 19.7 percent for users chosen by the algorithm.

That’s a significant result that marketers, politicians, news organizations will be eyeing with envy.

An interesting question is how they can make this technique more generally applicable. It raises the prospect of an app that allows anybody to enter a topic of interest and which then creates a list of people most likely to retweet on that topic in the next few hours.

Lee and co do not mention any plans of this kind. But if they don’t exploit it, then there will surely be others who will.

Ref: arxiv.org/abs/1405.3750: Who Will Retweet This? Automatically Identifying and Engaging Strangers on Twitter to Spread Information

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.