For the past few years, computer scientists have touted Web-search data as a way to spot emerging trends–from changes in housing prices and unemployment numbers to the next box-office hit or the location of the next flu epidemic. But research released today gives a more nuanced view of what these data are good at predicting, and why.
A group of researchers at Yahoo analyzed queries fed into the company’s search engine and found that such searches aren’t always the best way to spot a trend. They studied the volume of search queries related to a particular movie, song, and video game up to six weeks before each came out. The total number of searches was highly correlated with the revenue a movie made on opening weekend, the first-month sales of a video game, and the rank of a song on the Billboard chart.
The researchers then compared these results with those produced using traditional methods. For movies, they looked at the Hollywood Stock Exchange, a futures market for trading box-office revenue for upcoming titles, or figures showing the number of theaters at which a movie will be screened. For games, the researchers examined the ratings provided from critics. For songs, they looked at its reviews as well as an artist’s current and previous rank on the Billboard chart.
Search-based predictions fared only a little better than these methods, and were sometimes worse. The research is published today in the Proceedings of the National Academy of Sciences.
Search-based predictions were most accurate for new video games. This may be because of a lack of data, says Jake Hofman, one of the Yahoo researchers involved. “The only early indicators of the quality of a nonsequel video game are reviews from critics,” says Hofman. So search data works because traditional data are not available. For both films and songs, search-based predictions offered no improvement over traditional methods.
In recent years, search queries have been promoted as a tool for trend-spotting. In 2008, Google researchers released a tool, called Google Flu Trends, for predicting how many people were getting sick with the flu in different places around the world, based on search queries for “flu,” ‘influenza,” and similar terms. They found that the tool could predict the likely number of cases in parts of the United States 10 days before the Centers for Disease Control and Prevention (CDC) could.