Seeking Edge, Websites Turn to Experiments

Optimization technology is reshaping publishers’ decision-making process—and the Web itself.

Antonio Regaladoarchive page

January 22, 2014

1-800-Dentist is a small company facing a big decision. What picture on its Web home page will get the most people to fill out a form with their names and phone numbers?

At many Web publishers, such decisions can lead to impassioned arguments, fruitless debates, even hurt feelings. But 1-800-Dentist doesn’t leave it to chance or opinion. Instead it runs an experiment. It launches two or more versions of a Web page, and then watches as users react. After thousands of people have visited, one version will have edged out the others with a statistically significant improvement in the number of sign-ups.

Such optimization testing is quickly spreading across the Web. And as companies gain access to tools that let them run their businesses like ongoing science experiments, it’s changing not only how decisions get made but what the Web looks like. “There used to be a battle of opinions in our company,” says Elliot Kharkats, the Web analytics and testing manager for 1-800-Dentist. “The designer would get upset. The boss would intervene. But we don’t have a story like that anymore. No one is really committed to their version anymore, because testing proves over and over again that the smartest people in the room are just wrong.”

The software 1-800-Dentist uses is called Optimizely. It allows publishers to easily carry out so-called A/B tests—statistical horse races between two or more versions of a website. Optimizely was founded four years ago by Dan Siroker and Pete Koomen, former product managers at Google, a company where A/B testing is used extensively to rate how search results are displayed.

The startup thinks any website, large or small, can be optimized, and its ideas got a boost of publicity from Siroker’s close involvement with President Obama’s reëlection campaign, which broke records for online fund-raising. Kyle Rush, head of optimization for Optimizely, who ran testing for the Democrats in 2012, says the campaign used A/B testing to weigh every change to its fund-raising Web page, discovering big improvements along the way. At one point late in the race, they found that adding a personal message from the president—“Stand with me, work with me …”—led to a 11.3 percent increase in online donations by visitors to the page.

Yet testing also upset the campaign team. Time was tight, and Rush, just a midlevel employee, was running experiments that didn’t always work out. “We would do variations and it would drop the conversion rate by 30 percent for three hours,” Rush says. “That caused a panic. People said, ‘Oh my God, we can’t afford any more risks like that!’ The campaign environment is very risk averse. And that’s the main thing that has to be untaught in most businesses.”

At least 15 percent of the top 10,000 websites are conducting A/B tests at any given time, according to BuiltWith, an Australian company that scans sites to see what types of third-party software they are using. The not-for-profit Wikimedia Foundation, which publishes Wikipedia, tested various tweaks to its fund-raising messages throughout 2013, with results a spokesman calls “amazing.” The organization says the more effective its appeals are, the fewer of them it has to show.

“No one is really committed to their vision anymore, because testing proves over and over again that the smartest people in the room are just wrong.”

But there are risks to following the data. It can turn into a tyranny of mob taste that diminishes the judgment of professionals or artists. In 2009, a top Google designer named Douglas Bowman quit, complaining that the company “couldn’t decide between two blues, so they’re testing 41 shades … to see which one performs better.” While Google says the designer’s story is not entirely accurate, the professional anxieties are real.

“Now anyone with the data can make the call,” says Rush. “And that is very frightening for a lot of organizations.”

Traditional media companies, in particular, aren’t ready. Often, publishers don’t have clear objectives, with editors, designers, and advertising salespeople each advocating different aims. Without a clear goal, says Rush, “software is not going to help you.”

Some successful new Web publishers are born with optimization at the center of their decision making. One is Buzzfeed, an eight-year-old news site that is perfecting ways to increase page views using A/B testing and other statistical techniques. Its trademark “listicles” (one typical story: “10 Problems That Only Short Girls Understand”) are viewed by 130 million people a month. That’s more than four times the number who read the New York Times.

In fact, intensive testing appears to be reshaping what the Web looks like. But the page designs that are succeeding won’t win any awards for art direction, just as listicles don’t win Pulitzers. Even proponents of optimization technology admit it can produce sites with simple, cookie-cutter looks.

But A/B testing is spreading because it’s become easy to do. Optimizely says it can pick a winning design after as few as 100 visits for sites that have never been optimized. In practice, running experiments is often much harder. At 1-800-Dentist, which is based in Los Angeles, Kharkats says he’s testing text and images for several slightly different landing pages and estimates that he will need 150,000 visitors to each in order to detect a difference. That could take months, he says.

By now, everyone at 1-800-Dentist is used to the idea that the results could be surprising. One picture that won out for a time last year showed a short-haired male dentist firmly gripping the shoulder of a female patient. That’s not something most dentists would even do. Kharkats himself admits it was a little weird. “We did test different pictures. This guy beat the other versions. Don’t ask me how,” he says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.