Skip to Content

Startup Turns Data Crunching into a High-Stakes Sport

Kaggle organizes contests for organizations looking to make valuable predictions from mountains of data.
February 3, 2012

Some things—fog in San Francisco or traffic in New York City—are easy to predict. Others, such as the way a stock market will react to big trades, or the progression of an HIV patient’s illness, are far more complicated. That’s where a startup called Kaggle comes in. It organizes contests in which participants attempt to make seemingly impossible predictions by analyzing mountains of data.

Data miner: Anthony Goldbloom created Kaggle to make data analysis a competitive sport, connecting companies that have mounds of data with people who can help them get value from it.

Kaggle corrals thousands of people with backgrounds in data science, including PhDs, graduate students, professors, and people who work at companies such as IBM and Google, offering them the chance to compete to solve companies’ big-data conundrums in exchange for cash. Users take data provided by contest sponsors and compete using custom-made algorithms to find patterns and make the most accurate predictions. You might think of it as a predictive-modeling death match.

Created by Australian economist Anthony Goldbloom, Kaggle was inspired partly by a competition Netflix held from 2006 to 2009. The company offered $1 million to the team that could improve the accuracy of its movie-recommendation software by 10 percent.

The popularity of the Netflix competition showed Goldbloom how many people were interested in working on companies’ data-related conundrums. His 2008 internship at The Economist exposed him to plenty of companies with data that could be mined for valuable insights, but without the right people to study it.

He bet there was room for a company that would bring these two sides together, and figured that giving it a competitive twist would provide better results.

He was on to something. Since launching in April 2010 with a prize of $1,000 for the team that could most accurately predict how countries would vote in the annual Eurovision Song Contest, Kaggle has run 30 different competitions, five of which are still in progress.

And Kaggle’s community, which has grown to about 27,000 people, is getting results. In one early challenge, a Drexel University academic provided anonymous HIV patient records containing genetic marker data that he hoped could be used to predict the progression of the virus. Within a week and a half, Kaggle users could predict the progression of the virus with 70 percent accuracy, when comparing their predictions with known data—a milestone academic research reached only after four years of effort. By the end of the three-month competition, site users had created a model that reduced the previous error rate by about a third and increased the accuracy of predictions to 77 percent.

Goldbloom says the site’s appeal for competitors is the intoxicating feeling of rising on the leader boards. Those who submit the best solutions rise to the top of the leader board for that competition, something that users love. “You want to keep climbing the ladder,” Goldbloom says.

Will Cukierski, a biomedical engineering doctoral student at Rutgers University, not only likes climbing the ladder, but also sees the competitions as a way to get a toehold in the job market. He’s participated in about half a dozen Kaggle competitions, winning first place in one and getting near the top in others. “It’s a little bit of fun and a little bit of business,” he says.

Though most of the people working on Kaggle’s competitions have backgrounds in data mining, winners usually come from a different field than the one the competition represents–probably because they’re able to approach the problem from a new angle, Goldbloom says.

Barbara Chow, education director for the William and Flora Hewlett Foundation, is hoping this outside-the-box thinking helps her group’s challenge, which seeks a better way to automatically score student essays. The contest, which offers a $60,000 grand prize and ends April 30, is running concurrently with a private competition that includes major companies already working in the automated essay scoring field.

Though she’s not sure if Kaggle’s community will come up with the best result, Chow said the Hewlett Foundation decided to experiment with running the challenge since the site has “great access to the right people.”

Cukierski is one of these people—his team is hard at work on the competition, trying to best current automated offerings and create a solution that approaches the grades humans give. How are they doing so far? “Our preliminary results show we’re already pretty close to the humans,” he says. 

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.