A Winning Web Formula

Advertisers could benefit from analyzing the early popularity of online content.

Kate Greenearchive page

December 4, 2008

As online advertising money starts shrinking in the economic downturn, some researchers are looking for ways to make the most of every single dollar. Recent research from HP Labs in Palo Alto, CA, shows that it’s possible to predict, with reasonable accuracy, how popular an online video clip or news story will become simply by looking at how well it does within the first few hours of being posted. If content providers can predict how many views a video or article will get over a set period, then they can match the most popular items with specific high-dollar ads. Additionally, content providers can place potentially popular content in eye-catching spots on their site, further increasing the number of people who see it and the accompanying ads.

“There’s an obvious byproduct of what we’re doing here for advertising,” says Bernardo Huberman, a senior fellow at HP who led the work. “This will allow people who advertise to at least start getting a sense of what they want […] if very early on you can tell if people will like a video or news story.”

Huberman and his colleagues looked at historical data gathered from the video site YouTube, and from Digg, a news aggregator that lets readers’ votes determine which stories become most prominent. The researchers applied mathematical models to these data sets, determining a “popularity curve” for different items. These curves allowed the researchers to extrapolate the future popularity of an item using only information about its popularity over the first few hours or days.

HP isn’t the only organization trying to predict the popularity of content. Researchers at Google, Yahoo, Microsoft, and IBM, to name a few, have all invested resources and money in researching the problem. A few years ago, Duncan Watts, now a researcher at Yahoo, showed that the quality of a song is a very poor indicator of its eventual popularity, and that long-term song popularity–as measured on music-sharing networks–can be determined fairly early on, when a sort of popularity trajectory is determined.

In the case of Digg, Huberman says that within the first few hours it is clear whether a story will become popular or not (depending on how many “diggs”–or votes–it receives from the site’s community of readers). Factoring in the time of day that a story is submitted (a noontime story will get, on average, twice as many early diggs as a story submitted at midnight), the researchers found that if a story receives a low number of diggs, it has relatively little hope of being one of the top viewed stories of the day. Conversely, if a story receives hundreds of diggs in the first hours, it’s likely to be much more popular.

The popularity of YouTube videos follows a similar pattern, albeit on a longer timescale. By looking at the number of views a video gets on its first day, the researchers could determine the likelihood of reaching a certain level of popularity after a longer period. For instance, if only 15 people view the video during the first day, it’s unlikely to become a big hit. However, if more than 100 people see it on the first day, then there’s a high likelihood that it will be seen tens of thousands of times more.

In the case of both YouTube and Digg, Huberman notes that predictions become more accurate as data are considered over longer periods. For instance, within seven hours, it is possible to predict a story’s future popularity on Digg nearly perfectly. Likewise for videos posted to YouTube for 20 days.

Huberman says that other sites, such as online stores, would need their own ways of determining the popularity of their content because each site has unique characteristics. But advertisers, he says, armed with popularity predictions, could quickly determine which products might “go viral” and then tag special ads to those.

“I think popularity prediction is an interesting topic in general,” says Claudia Perlich, researcher at IBM’s Watson Research Center in Yorktown Heights, NY. “And I certainly see the value [of] predicting popularity for advertising.” However, she notes that it’s only one part of the advertising puzzle. Increasingly, she says, advertisers are interested in the type of people who are viewing the content, and they find it useful to know the path that a Web surfer has taken before arriving at a site, so that he or she can be better targeted with a specific ad.

Perlich also raises questions about the two systems under study. “I have the slight worry that the results are driven by underlying technology,” she says. It could be possible that the researchers are simply measuring the proprietary process in which stories become popular on Digg and some of the video-promotion features of YouTube. This is the problem, she says, with doing an experiment in which the systems are proprietary and it’s impossible to know exactly how they work.

Huberman is confident that his methodology can predict popularity independently of specific algorithms used by the sites. He is in the process of analyzing the popularity of people on Twitter, a microblogging service that lets people post short messages to one another and subscribe to updates. Better understanding of these social networks, he says, could lead to entirely new business models. “The only thing that’s of value today is people’s attention,” he says. “An immense amount of money is spent on trying to draw our attention to things.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.