Skip to Content

Data Mining Reveals the Surprising Factors Behind Successful Movies

The secret to making profitable movies will amaze you. (Spoiler: it’s not hiring top box office stars.)

The 2007 comedy Evan Almighty starred Steve Carell and Morgan Freeman, two big box office stars. So in some ways, it’s not surprising that the movie raked in over $100 million in revenue. By comparison, the 2001 comedy Super Troopers starred some relative unknowns and took in a measly $18.5 million.

And yet a wise investor would almost certainly choose the second rather than the first to invest in. That’s because Super Troopers cost only $3 million to make, compared with Evan Almighty’s $175 million, and produced a return on investment of more than 5, compared with -0.4 for the bigger film.

But how to decide in advance what to invest in? Today we get an answer of sorts thanks to the work of Michael Lash and Kang Zhao from the University of Iowa. These guys have created a database of over 100 categories of film-related information, such as the budget and revenue, the stars involved, what the film was about, and when it was released, and then used a machine-learning algorithm to discover patterns that predict profitability. And the results  are surprising.  

The team began by combining data from two online sources: the Internet Movie Database and BoxOfficeMojo. In this way they gathered together data on over 14,000 films and 4,000 actors, directors, and so on, focusing in particular on films released between 2000 and 2010.

Lash and Zhao then used this data to work out how experienced individual actors were, how much revenue and profit each of their movies had made, and whether they had appeared in films with other actors. They did a similar calculation for directors as well. 

They also used the plot summaries on IMDb to compare the content of the films. And they worked out a return on investment for each film to get a sense of its profitability.

The task for the machine-learning algorithm was to hunt through this data looking for patterns that correlate with profitability.

It turns out that the factor most strongly correlated with a film’s profitability is the average gross revenue made by the director’s previous films. In other words, directors who have generated more revenue in the past are correlated with greater profitability in future.

In many ways that is unsurprising. Good directors, such as Christopher Nolan, are often well known by the cinema-going public and can be a considerable draw.

However, the results throw up a significant surprise. They show that popular stars are correlated with increased revenue but not with profitability. In other words, big stars draw crowds but they don’t guarantee a profit, presumably because they cost a lot to hire in the first place.

Other factors that turn out to be important are whether the film has an R rating or is designated foreign, which presumably correlate with lower profits (although Lash and Zhao do not make this clear).

“Our experiments based on 11 years of movies show that [our algorithm] can do a decent job in predicting the success of movies,” they say.

That’s a curious study that lacks persuasiveness. There are various ways of predicting box office success in advance, although few have been aimed at investors, who obviously have to be involved early in the process.

If this method is to be believed, it shows the true value of a director with the right kind of track record and also shows that star power isn’t the guarantee of success that many might imagine. That’s something the investors in Evan Almighty might have found useful to know at the beginning of that film’s production process.

The real test, of course, is not in predicting the past but in predicting the future. If this algorithm is able to pick out potentially profitable films before they are even made, then Lash and Zhao and set to become wealthy individuals. We’ll look forward to seeing how they fare.

Ref: : Early Predictions of Movie Success: the Who, What, and When of Profitability

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.