March Madness—the NCAA college basketball championship playoffs—is among the most popular sporting events in the US, thanks in part to the wide-ranging contest that has evolved around predicting which teams will progress through the tournament. This year, almost $10.4 million is on the line in office pools or more organized competitions, and more than 40 million Americans will fill out their own versions of the playoff brackets to take part, according to the American Gaming Association. The chances of predicting a perfect bracket, which no one has ever done, are at least 1 in 128 billion and could be as remote as 1 in 9.2 quintillion.
Now machine learning is taking a shot.
Kaggle, the online platform for predictive modeling and analytics competitions that was acquired by Google parent company Alphabet last year, is hosting a competition for both the NCAA men’s and women’s tournaments. Kaggle provides a data set with information like tournament seeds going back to the 1984-85 season; final scores of all regular season, conference tournament, and NCAA tournament games since 1984-85; and every Division I men’s and women’s basketball play-by-play moment since 2009. It all adds up to more than 40 million data points.
Competitors don’t fill out a traditional bracket; instead, they create and use models to predict how likely a team is to win each game. The model is judged on both the outcome of the game and the confidence that the model finds in its prediction. So if a model is 99 percent certain that a team will win and it turns out to be right, it gets more points than one with a correct prediction that was only 95 percent sure. If a model is very confident and incorrect, however, it will lose more points. This is to make it harder to win with dumb luck or random chance. A prize pool of $100,000 is to be divided among the top three brackets for both tournaments. Entries are due on Thursday, and 500 teams are already signed up.
But it’s unclear that machine learning is ready to take on bracketology, which might be more of an art than 40 million data points would have you believe. Since college teams change players and team rosters from season to season, the algorithms might not even have the right data to parse in search of patterns.
And then there are the unquantifiable elements, like players who “click” or a team on a streak. The cities that host tournament games can also influence a team’s performance in ways an algorithm may not anticipate. For instance, games in a city like Denver could lead to altitude issues for teams accustomed to playing at sea level, or a particularly rowdy crowd could help a team gain momentum in the last moments. This year, the Big Ten teams have had two weeks off before March Madness for the first time ever, and their players may be better rested—or rustier—than teams from other conferences. An algorithm can’t take into account an event that it has never seen before. Upsets are called upsets for a reason; if machine learning could predict them, the term would become moot.
“I actually think that tournaments like NCAA [March Madness] are not the sweet spot for machine learning,” wrote Kaggle CEO Anthony Goldbloom, on a Reddit AMA. “There are many fewer March Madness games than ad clicks/fraud events etc.”
Kaggle has hosted four previous March Madness competitions, although this is the first season with prize money. Last year’s winner, Andrew Landgraf, based his model on previous winners’ algorithms, but with a twist. He considered what other entrants in Kaggle’s competition might do and directed his algorithm to take advantage of their potential mistakes. People do this with office pools all the time: if you were in an office with a bunch of Duke fans, betting against Duke might leave you with the best bracket if the Blue Devils were to lose. Even with his carefully planned model, Landgraf says, luck was a huge part of his success.
Eventually, algorithms might be good enough to predict things like hot streaks, but in the meantime, human-machine collaboration might represent the future of bracketology. Betting syndicates believe so—they’re already using both predictive analytics and data from human-powered online gambling markets to place their wagers, according to Adam Kucharski, a researcher and author of The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling.
“Despite all their flaws, betting markets are a good way to canvass a crowd’s knowledge,” Kucharski says. “Understanding that human element can be very useful.”
The results of Kaggle’s tournament can be judged by the imperfect brackets of years past. Thirty-nine games is the closest anyone has come to a perfect result, so that’s an easy benchmark for success. And if one of Kaggle’s algorithmic contenders or a human-machine collaboration achieves the ultimate goal, there are some lucrative rewards waiting. Billionaire Warren Buffett has a long-standing offer to award any of his employees who comes up with a perfect bracket a million dollars a year for life.
But once we get a perfect bracket, what’s next? Kaggle’s competition starts after the selection of all 64 teams. The next challenge may be predicting the tournament winners before you know who’s even in the running.