Trusting Data, Not Intuition

Businesses will learn harsh but valuable truths if they subject new ideas to controlled experiments, says Microsoft’s Ronny Kohavi.

Erica Naonearchive page

February 22, 2011

You know that great idea you have for improving your business? Ronny Kohavi, an architect at Microsoft’s Online Services division, says there’s good reason to suspect it’s actually lousy. Studies of the software industry indicate that when ideas people thought would succeed are evaluated through controlled experiments, less than 50 percent actually work out.

**Pick your page:** Is the top or bottom version of this home page for a previous version of Microsoft Office more likely to produce sales? The bottom version might seem appealingly uncluttered. But when subjected to tests with actual Web users, it performed 64 percent worse than the other.

“This is very humbling,” Kohavi says. Still, he observes, it’s hard to convince people to use data to evaluate an idea rather than relying on their intuition.

Kohavi’s been a true believer for years, but most businesses aren’t using these principles. He says that when he started at Microsoft in 2005 and founded an internal experimentation platform, he couldn’t find any internal groups that were doing controlled testing before launching changes to Microsoft’s websites.

Kohavi knew the value of testing from his time as the director of data mining and personalization at Amazon, where his team had evaluated an idea called behavior-based search. The idea: when users searched for keywords, they would be presented not only with results that matched the keywords but also with items that other people actually bought after searching for those words. The search team didn’t like the idea, because it would mean recommending products that seemed unrelated to what the user had searched for.

However, the idea was tested, and the results suggested that adding this feature would increase revenue by a stunning 3 percent. Amazon shifted resources to make the change as soon as possible, and it has since become one of the company’s most successful personalization features.

What’s important, Kohavi says, is to test ideas quickly, allowing resources to go to the projects that are the most helpful.

At Microsoft, Kohavi devoted himself to winning others to his cause as he built his experimentation platform. He staged events—for example, offering a nice polo shirt to anyone who could correctly guess the outcome of six out of eight experiments. Though he received more than 200 responses, his team didn’t have to give out a single shirt. Intuition never matched data.

“The experimentation platform is responsible for telling you your baby is really ugly,” Kohavi jokes. While that can be a difficult truth to confront, he adds, the benefit to business—and also to employees responsible for coming up with and implementing ideas—is enormous. Even when the results aren’t as dramatic as they were with behavior-based search, testing can do a lot to prevent wasted resources.

“You have to decide if you’re going to trust the data and drill deeper,” Kohavi says.

At one point, for example, Microsoft was considering adding more advertisements to its msn.com homepage. Designers reasoned that any negative reaction by users would be outweighed by the increased revenue the ads would bring.

Kohavi organized a controlled experiment to test exactly what effect the ads would have, considering the value of users’ clicks and time spent on the site. The results were sobering: the degradation of user experience was so severe that it far exceeded the expected revenue. The project was stopped, and the feature never launched.

“Businesses need to try many things and be willing to fail,” says Greg Linden, an engineer best known for designing key recommendation features at Amazon. Linden knows and admires Kohavi. He stresses that “constant, continuous, ubiquitous experimentation is the most important thing.”

Kohavi is now focused on running tests on Bing, the Microsoft search engine. Everyone who visits that site, he says, is participating in “tens of experiments” aimed at helping Microsoft tweak its search product to draw users in more effectively.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.