For the past few years, computer scientists have touted Web-search data as a way to spot emerging trends–from changes in housing prices and unemployment numbers to the next box-office hit or the location of the next flu epidemic. But research released today gives a more nuanced view of what these data are good at predicting, and why.
A group of researchers at Yahoo analyzed queries fed into the company’s search engine and found that such searches aren’t always the best way to spot a trend. They studied the volume of search queries related to a particular movie, song, and video game up to six weeks before each came out. The total number of searches was highly correlated with the revenue a movie made on opening weekend, the first-month sales of a video game, and the rank of a song on the Billboard chart.
The researchers then compared these results with those produced using traditional methods. For movies, they looked at the Hollywood Stock Exchange, a futures market for trading box-office revenue for upcoming titles, or figures showing the number of theaters at which a movie will be screened. For games, the researchers examined the ratings provided from critics. For songs, they looked at its reviews as well as an artist’s current and previous rank on the Billboard chart.
Search-based predictions fared only a little better than these methods, and were sometimes worse. The research is published today in the Proceedings of the National Academy of Sciences.
Search-based predictions were most accurate for new video games. This may be because of a lack of data, says Jake Hofman, one of the Yahoo researchers involved. “The only early indicators of the quality of a nonsequel video game are reviews from critics,” says Hofman. So search data works because traditional data are not available. For both films and songs, search-based predictions offered no improvement over traditional methods.
In recent years, search queries have been promoted as a tool for trend-spotting. In 2008, Google researchers released a tool, called Google Flu Trends, for predicting how many people were getting sick with the flu in different places around the world, based on search queries for “flu,” ‘influenza,” and similar terms. They found that the tool could predict the likely number of cases in parts of the United States 10 days before the Centers for Disease Control and Prevention (CDC) could.
However, at the time, the CDC had a delay of up to two weeks in releasing public reports of flu caseloads. The agency is now rolling out new technology that will reduce that delay to one week. If the new technology works, Web-search flu predictions may not be any better than the CDC’s figures.
Philip Polgreen, an assistant professor of medicine at the University of Iowa, published a paper in 2008 that showed a correlation between Yahoo’s search data and official reports of the flu. Polgreen says a user’s intent is often difficult to figure out. For example, a search for an illness or a symptom doesn’t necessarily mean someone is sick–it could be that the searcher is writing a research report on the topic.
An analysis released this spring by Justin Ortiz, a clinical fellow at the University of Washington, suggests that Google Flu Trends can overestimate the number of people getting sick from flu when there is heightened press coverage of the flu, such as during the 2009 H1N1 pandemic.
However, as more data become available, some researchers believe better predictions will be possible. “Over the next five to 10 years, I see more and more companies using this kind of nanodata–fine-grain data with hundreds of billions of observations–in their forecasting,” says Erik Brynjolfsson, director of the MIT Center for Digital Business.
Brynjolfsson says that Web queries provide the most accurate predictions in cases where people do research before they make a purchase. His research has shown that a rise in home sales can be predicted from Web search queries. Each percentage-point rise in the housing search index predicts sales of 121,400 additional houses in the next quarter.
The Yahoo researchers say that the search data may be particularly useful when a small improvement in prediction accuracy could have a big impact–for example, in the financial world.
Web searches may also be helpful for spotting sudden changes. For example, existing statistical models have difficulty telling when the popularity of a song rising up the Billboard charts will wane. But search queries can quickly spot this shift. These turning points can also be important in health, economics, and consumer research.