What Big Data Needs: A Code of Ethical Practices

Four key principles that companies should follow if they hope to analyze customers’ data without alienating them.

Jeffrey F. Rayportarchive page

May 26, 2011

In this era of Big Data, there is little that cannot be tracked in our online lives—or even in our offline lives. Consider one new Silicon Valley venture, called Color: it aims to make use of GPS devices in mobile phones, combined with built-in gyroscopes and accelerometers, to parse streams of photos that users take and thus pinpoint their locations. By watching as these users share photos and analyzing aspects of the pictures, as well as ambient sounds picked up by the microphone in each handset, Color aims to show not only where they are, but also whom they are with. While this kind of service might prove attractive to customers interested in tapping into mobile social networks, it also could creep out even ardent technophiles.

Color illustrates a stark reality: companies are steadily gaining new ways to capture information about us. They now have the technology to make sense of massive amounts of unstructured data, using natural language processing, machine learning, and software architectures such as Hadoop, which handles high volumes of simultaneous search queries. Messy data of this kind, long relegated to data warehouses, is now the target of data mining. So is the information generated by social networks—user profiles and posts. Its quantity is staggering: a recent report from the market intelligence firm IDC estimates that in 2009 stored information totaled 0.8 zetabytes, the equivalent of 800 billion gigabytes. IDC predicts that by 2020, 35 zetabytes of information will be stored globally. Much of that will be customer information. As the store of data grows, the analytics available to draw inferences from it will only become more sophisticated.

It’s no wonder that there are calls for corporations to create positions such as chief privacy officer, chief safety officer, and chief data officer, or that American and European legislators have been considering several kinds of privacy measures. In one bipartisan effort, Senators John McCain and John Kerry have proposed the Consumer Privacy Bill of Rights Act of 2011, which aims, in part, to restrict what online companies can do with customer data. Senator Jay Rockefeller has proposed his own piece of legislation, the Do-Not-Track Online Act of 2011. The European Union’s Article 29 Working Group is addressing similar concerns.

In the private sector, the Digital Advertising Alliance has sought to get ahead of such rule-making by introducing its own privacy framework to assure the security and safety of customer information. Its Self-Regulatory Program for Online Behavior Advertising comes on the heels of several incidents: Epsilon’s admission that hackers gained access to customer information from clients such as CitiGroup, Target, and Walgreen’s; Sony’s revelation that its PlayStation platform failed to safeguard the account information of up to 100 million customers; and Apple’s confirmation that it uses an unencrypted file stored in iTunes accounts to track movements of individual iPhone users in the physical world.

For all the privacy concerns, the online economy creates enormous value by using customer information. In 2009, according an ad industry study cited by the Wall Street Journal, the average price of an untargeted ad online was $1.98 per thousand views. The average price of a targeted ad was $4.12 per thousand. We used to measure the success of websites as if they were portals—by how much traffic they could muster. Now we measure them as social networks—by how much they know about their users. This is why Wal-Mart recently acquired Kosmix, a Silicon Valley startup that filters and finds meaning in vast streams of Twitter messages. Other retailers, along with digital players such as Facebook and Yahoo, are using the technology of another startup, Cloudera, to sort through enormous quantities of behavioral information compiled over years (sometimes decades) in search of insights based on patterns that only machines can fathom. Intelligence generated in these ways can lead to better games from companies like Zynga and better advertising from your favorite brands. David Moore, the CEO of 24/7 Real Media, argues that when an ad is targeted properly, “it ceases to be an ad; it becomes important information.”

The opportunity for profit helps explain the rise of dozens of data exchanges, data marts, predictive analytic engines, and other intermediaries. It’s also why players such as Google, Facebook, and Zynga, among many others, are finding ways to aggregate ever more information about users. Facebook provides but one example of how extensive this kind of tracking can be. Its seemingly innocuous “Like” button has become ubiquitous online. Click on one of these buttons, and you can instantly share something that pleases you with your friends. But simply visit a page with a “Like” button on it while you’re logged in to Facebook, and Facebook can track what you do there. The first aspect sounds great for consenting adults; the latter is more than a little unsettling. Facebook is hardly alone. A company called Lotame helps target online advertising by placing tags (sometimes known as beacons) on browsers to monitor what users are typing on any Web page they might view.

The potential dark side of Big Data suggests the need for a code of ethical principles. Here are some proposals for how to structure them.

Clarity on Practices: When data is being collected, let users know about it—in real time. Such disclosure would address the issue of hidden files and unauthorized tracking. Giving users access to what a company knows about them could go a long way toward building trust. Google has done this already. If you want to know what Google knows about you, go to www.google.com/ads/preferences, and you can see both the data it has collected and the inferences it, and third parties, have drawn from what you’ve done.

Simplicity of Settings: One way to avoid an Orwellian nightmare is to give users a chance to figure out for themselves what level of privacy they really want. In theory, Facebook does this. In practice, as Nick Bilton reported recently in the New York Times, Facebook’s privacy policy has more words (5,830) than the United States Constitution (4,543, not counting the amendments). But that’s just the tip of the iceberg. Try changing your privacy settings, and you will encounter over 50 privacy toggles giving rise to over 170 privacy options.

Privacy by Design: Some argue that neither clarity nor simplicity is sufficient. Ann Cavoukian, privacy commissioner for the province of Ontario, coined the phrase “privacy by design” to propose that organizations incorporate privacy protections into everything they do. This does not mean Web and mobile businesses collect no customer information. It simply means they make customer privacy a guiding principle, right from the start. Microsoft, which in 2006 issued a report called “Privacy Guidelines for Developing Software Products and Services,” has embraced this principle, using a renewed emphasis on privacy as a way to differentiate itself; the latest version of Internet Explorer, IE9, lets users activate features that can block third-party ads and content.

Exchange of Value: Walk into a local Starbucks, and you’re likely to feel flattered if a barista remembers your name and favorite beverage. Something similar applies on the Web: the more a service provider knows about you, the greater the chance that you’ll like the service. Radical transparency could make it easier for digital businesses to show customers what they will get in exchange for sharing their personal information. That’s what Netflix did in running a public competition offering third-party developers a $1 million award for creating the most effective movie recommendation engine. It was an open acknowledgement that Netflix was using users’ movie-viewing histories to provide increasingly targeted, and thus more useful, recommendations.

These principles are by no means exhaustive, but they begin to outline how companies might realize the value of Big Data and mitigate its risks. Adopting such principles would also get ahead of policymakers’ well-intentioned but often misguided efforts to rule the digital economy. That said, perhaps the most important rule is one that goes without saying, something akin to the Golden Rule: “Do unto the data of others as you would have them do unto yours.” That kind of thinking might go a long way toward creating the kind of digital world we want-and deserve.

Jeffrey F. Rayport specializes in analyzing the strategic implications of digital technologies for business and organizational design. He is a managing partner of MarketspaceNext, a strategic advisory firm; an operating partner at Castanea Partners; and a former faculty member at Harvard Business School. Carine Carmy contributed research to this article.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.