What’s Next for the Netflix Algorithms?

Applying the lessons of the $1 million competition to other recommendation systems will be a challenge.

Erica Naonearchive page

October 8, 2009

When the Netflix Prize was awarded last month, it ended three years of intense competition aimed at finding a better algorithm for predicting users’ movie preferences.

The winning team, BellKor’s Pragmatic Chaos, was the first to forecast Netflix customers’ movie ratings with 10 percent better accuracy than the company’s in-house system–a feat that many experts believed would be impossible when the million-dollar prize was announced. Netflix plans to offer a second prize, this time for algorithms that predict movie preferences using more user information, such as gender, age, and zip code. But experts say that the real challenge is to find ways to apply the lessons learned through the original Netflix challenge to other recommendation systems.

At the end of October, experts in the field will meet at the ACM Conference on Recommender Systems in New York City to ask, among other things, what has been learned from the Netflix Prize.

Participants in the original Netflix competition trained their algorithms using an enormous collection of data: more than 100 million ratings covering almost 18,000 titles from nearly half a million subscribers. To test their results, their algorithms were tested on a set of data maintained by Netflix and kept secret from the contests to prevent cheating.

Netflix’s data presented several formidable obstacles, explains Nicholas Ampazis, an assistant professor in the department of financial and management engineering at the University of the Aegean in Greece, whose team, The Ensemble, ended the contest in second place. The dataset was huge, but it was also sparse, meaning that customers typically rated about 1 percent of the movies they watched. “Cracking the 10 percent barrier thus meant pushing the limits of existing modeling techniques to a significant degree,” says Ampazis.

But the challenges presented by the Netflix data also made the competition very valuable, according to Ces Bertino, another member of The Ensemble. Researchers usually have the luxury of choosing datasets, and of having more information about that data. In the Netflix contest, the contestants were forced to apply all algorithms to the same set of frustratingly uneven real-world data. “Because people had to use a fixed dataset, they needed to deal not only with the advantages of a particular method, but also the weaknesses of it,” Bertino says. “You could not escape it.”

Gavin Potter, who gained recognition for his breaking the top 10 of the Netflix prize in 2008 under the name “Just a guy in a garage,” says that a few key realizations allowed the winning algorithms to meet the goal. First, a powerful algorithm for searching for patterns in datasets, a technique known as collaborative filtering, was streamlined so that it could be used on the large Netflix dataset. Second, participants learned to pay attention to certain new types of details, for example the fact that ordering a movie at all indicates some preference for it, even if the customer didn’t rate it. Date and time information also proved significant. But the biggest realization, Potter notes, was that blending a variety of approaches yielded the best results.

The blending of different approaches has gotten a lot of attention in post-mortems of the competition, but John Riedl, a professor of computer science at the University of Minnesota, says he has mixed feelings about it. “People like me have been looking for ideas that would give us insight into the structure of the solution,” he says, “where we really understand something new about not only what solution does well, but why it is that it does well.”

The winning models, however, haven’t yielded such insight. What they do suggest, according to Riedl, is that combining lots of algorithms with machine-learning techniques might be a good approach to handling large datasets in general. However, even that remains to be proven. “A lot of us are worried that this approach may not be as fruitful elsewhere,” he adds.

What is clear is that many industries could benefit from the types of models built for the competition. Besides other online recommendation systems, Ampazis suggests that such algorithms could be applied in market trading, fraud detection, spam-fighting, and computer security. Bertino says that members of The Ensemble are currently considering how best to use the technology they generated in the course of the competition.

Potter is working on applying his own research for the prize to the online dating site YesNoMayB, which employs two-way recommendation algorithms to find users who may want to meet one another. In particular, he hopes to use insights from the Netflix Prize to make predictions based on users’ implicit preferences, such as what pages they load.

The Netflix Prize focused a lot of attention on recommendation systems and produced huge advances in the field. The second competition seems likely to do the same. But Riedl thinks that other components of recommendation systems may be left behind in the process. “Now it’s time for us as a field to think about what other aspects may have been neglected,” he says, “and how researchers can make progress on those aspects in a way that has implications for industry.”

For example, Riedl sees a need for algorithms that allow recommendation systems to use ever-larger sets of data, systems that explain to a user why a particular recommendation was made, and better user interfaces. He also notes that, while the Netflix competition made impressive advances in interpreting sparse data, in some cases it may make sense to learn how to design sites to encourage users to give more data. He hopes that the upcoming meeting in New York will help define a broader set of questions for researchers to address.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.