Using the Web entails some privacy risks. Companies using sophisticated data-mining algorithms can glean an astonishing amount of information on each of us, from our reading preferences to our shopping habits. Most of us have even grown accustomed to the idea that malicious hackers could steal every bit of our financial data, seemingly no matter how careful we are.
What we’re not quite used to, though computer scientists and privacy advocates have been warning about it for years, is just how much personal information can be gleaned from our mobile phones–particularly the increasingly popular, sensor-laden smart phones, like Apple’s iPhone and Motorola’s new Droid.
In a “Perspectives” column published today in Science (subscription required), Tom M. Mitchell, head of Carnegie Mellon’s Machine Learning Department, highlights both the benefits and risks introduced by real-time analysis of mobile data, and argues that society won’t be able to take maximum advantage of this technology until it addresses questions about how much of our lives can be observed and by whom.
“The potential benefits of mining such data range from reducing traffic congestion and pollution, to limiting the spread of disease, to better using public resources such as parks, buses, and ambulance services,” Mitchell writes. “But risks to privacy from aggregating these data are on a scale that humans have never before faced.”
Referred to as reality mining, such approaches utilize data from location and motion sensors, in top-end cell phones, built-in microphones, as well as stored call logs, contact lists, e-mails, text messages, and other files. Most reality-mining efforts to date have been research projects–academic and corporate–designed to analyze social interactions and personal behavior. An increasing number of real-world benefits are coming from these studies.
Mitchell points out, for instance, that in many cities, Google Maps uses anonymous location data from smart phones to provide nearly real-time reports of traffic congestion. And researchers have shown that by analyzing health-related Google queries (e.g., “Kleenex” or “cough syrup”) from particular areas, they can estimate the level of flu-like illnesses in different parts of the United States much more quickly than government agencies such as the Centers for Disease Control and Prevention can.
Combining data sets could open up many new possibilities, Mitchell says. “For example, if your phone company and local medical center integrated GPS phone data with up-to-the-minute medical records, they could provide a new kind of medical service using phone GPS data to detect that you have recently been near a person who is just now being diagnosed with a contagious disease–then automatically phoning to warn you.” Of course, he notes, this also opens up a whole new range of privacy concerns. Such a phone call, for example, could allow you to deduce information that someone would rather keep private–and feasibly could keep private without endangering others.
As Mitchell writes, technical means, such as data anonymization, can help limit threats to privacy and misuse of data. Another approach would be to mine data from multiple organizations without ever aggregating the data into a central repository.
But he argues, “Perhaps even more important than technical approaches will be a public discussion about how to rewrite the rules of data collection, ownership, and privacy to deal with this sea change in how much of our lives can be observed, and by whom. Until these issues are resolved, they are likely to be the limiting factor in realizing the potential of these new data to advance our scientific understanding of society and human behavior, and to improve our daily lives.”
This idea isn’t a new one in the field: MIT professor Sandy Pentland, often regarded as the “father” of reality mining, has argued for open discussions about the privacy implications of the technology since its inception. Indeed, Sense Networks, the reality-mining startup that Pentland cofounded to provide useful information to both individual consumers and companies, has very clear and specific policies about its use of data and users’ right to privacy.
If only the cellular carriers who already hold so much of our data would be half so considerate.