At a computer in her office at the Harvard School of Public Health in Boston, epidemiologist Caroline Buckee points to a dot on a map of Kenya’s western highlands, representing one of the nation’s thousands of cell-phone towers. In the fight against malaria, Buckee explains, the data transmitted from this tower near the town of Kericho has been epidemiological gold.
When she and her colleagues studied the data, she found that people making calls or sending text messages originating at the Kericho tower were making 16 times more trips away from the area than the regional average. What’s more, they were three times more likely to visit a region northeast of Lake Victoria that records from the health ministry identified as a malaria hot spot. The tower’s signal radius thus covered a significant waypoint for transmission of malaria, which can jump from human to human via mosquitoes. Satellite images revealed the likely culprit: a busy tea plantation that was probably full of migrant workers. The implication was clear, Buckee says. “There will be a ton of infected [people] there.”
This work is now feeding into a new set of predictive models she is building. They show, for example, that even though malaria cases were seen at the tea plantation, taking steps to control malaria there would have less effect on the disease’s spread than concentrating those efforts at the source: Lake Victoria. That region has long been understood as a major center of malaria, but what hasn’t been available before is detailed information about the patterns of human travel there: how many people are coming and going, when they’re arriving and departing, which specific places they’re coming to, and which of those destinations attract the most people traveling on to new places.
Existing efforts to gather that kind of travel data are spotty at best; sometimes public-health workers literally count people at transportation hubs, Buckee says, or nurses in far-flung clinics ask newly diagnosed malaria victims where they’ve been recently. “At many border crossings in Africa, they keep little slips of paper—but the slips get lost, and nobody keeps track,” she says. “We have abstractions and general models on travel patterns but haven’t been able to do this properly—ever.”
The data mining will help inform the design of new measures that are likely to include cheap, targeted campaigns of text messages—for example, warning visitors entering the Kericho tower’s signal zone to use bed netting. And it will help officials choose where to focus mosquito control efforts in the malarial areas. “You don’t want to be spraying every puddle for mosquito larvae all the time. But if you know there is a ton of importation from a certain spot, you want to increase your control program at that spot,” Buckee says. “And now I can pinpoint where the importation of a disease is especially important.”
Buckee’s most recent study, published last year in Science and based on records from 15 million Kenyan phones, is a result of a collaboration with her husband, Nathan Eagle, who has been working to make sense of cell-phone data for more than a decade. In the mid-2000s, after getting attention for his work mining data from the phones of volunteers at MIT, Eagle started to get calls from mobile carriers asking for insight into questions like why customers canceled their phone plans. Eagle began working with them. And when the couple spent 18 months in Africa starting in 2006—Buckee was doing work on the genetics of the malaria parasite—he studied call data for various purposes, trying to understand phenomena like ethnic divisions in Nairobi slums and the spread of cholera in Rwanda. Buckee’s results show what might be possible when the technology is turned on public-health problems. “This demonstrated ‘Yeah, we can really provide not just insight, but actually something that is actionable,’” says Eagle, now CEO of Jana, which runs mobile-phone surveys in the developing world. “This really does work.”
“This is the future of epidemiology. If we are to eradicate malaria, this is how we will do it.”
That demonstration suggests how such data might be harnessed to build tools that health-care workers, governments, and others can use to detect and monitor epidemics, manage disasters, and optimize transportation systems. Already, similar efforts are being directed toward goals as varied as understanding commuting patterns around Paris and managing festival crowds in Belgium. But mining phone records could be particularly useful in poor regions, where there’s often little or no other data-gathering infrastructure. “We are just at the start of using this data for these purposes,” says Vincent Blondel, a professor of applied mathematics at the University of Louvain in Belgium and a leading researcher on data gleaned from cell phones. “The exponential adoption of mobile phones in low-income settings—and the new willingness of some carriers to release data—will lead to new technological tools that could change everything.”
The world’s six billion mobile phones generate huge amounts of data—including location tracking and information on commercial activity, search history, and links in social networks. Innumerable efforts to mine the data in different ways are under way in research and business organizations around the world. And of those six billion phones, five billion are in developing countries. Many of them are cheap phones that can do little besides make calls and send text messages. But all such activity can be tracked back to cell-phone towers, providing a rough way to trace a person’s movements. Throw in the spread of mobile payment technology for simple commerce and you have the raw material for insights not only into epidemiology but into employment trends, social tensions, poverty, transportation, and economic activity.
The prospect of mining data from phones is especially tantalizing in poor countries, where detailed, up-to-date information on these matters has been scarce. “In the developing world, there isn’t a functioning census, you don’t know where traffic is, you don’t always have the data-gathering infrastructure of government,” says Alex “Sandy” Pentland, director of the Human Dynamics Lab at MIT, who has long been interested in insights from data created by mobile-phone use. “But all of a sudden, the one thing you do have—cell phones everywhere, especially in the past few years—can give you the equivalent of all that infrastructure already built in the developed world.”
When a call connects to a given base station, that station logs the ID number of the phone and the duration of the call; over time, this information can be used to get a sense of people’s regional movements and the shape of their social networks. Purchasing history on phones is also invaluable: records of agricultural purchases could be used to predict food supplies or shortages. And financial data collected by mobile payment systems can build credit histories and help millions of people without access to banking qualify for conventional loans. “The database analysis methods and the computers are very standard,” Pentland says. “It’s a matter of doing science and finding the right patterns.” Certain mobility patterns might relate to the spread of a disease; purchasing patterns could signify that a person has had a change in employment; behavioral changes or movement patterns might relate to the onset of an illness.
A powerful demonstration of how useful data from cheap phones can be came after the January 2010 earthquake in Haiti, which killed more than 200,000 people. Researchers at Sweden’s Karolinska Institute obtained data from Digicel, Haiti’s largest mobile carrier. They mined the daily movement data from two million phones—from 42 days before the earthquake to 158 days after—and concluded that 630,000 people who had been in Port-au-Prince on the day of the earthquake had left the city within three weeks. They also demonstrated that they could do such calculations in close to real time. They showed—within 12 hours of receiving the data—how many people had fled an area affected by a cholera outbreak, and where they went.
Most important, their work led to a model that could guide responses to future disasters. After analyzing data on pre-earthquake travel habits, the Swedish group found that Haitians generally fled the city for the same places where they’d spent Christmas and New Year’s Day. Such findings make it possible to predict where people will go when disaster hits.
Until recently, these studies were done by researchers who made some special arrangement with carriers to get the data (Eagle obtained it through his academic connections). But last year Orange, the France-based global telecom giant, released to the world’s research community—subject to certain conditions and restrictions—data based on 2.5 billion anonymized records from five months’ worth of calls made by five million people in Ivory Coast. The first phase of this grand experiment involves seeing just what it’s possible to do with the data.
Nearly a hundred research groups worldwide leaped at the opportunity to analyze the records. The resulting papers were scheduled to be presented in May at a conference at MIT under the name Data for Development, part of a larger conference of data-mining projects in both the poor and rich worlds. “It’s the first time a large-scale mobile-phone data set has been released at that scale,” says Blondel, who is chairing the conference. The papers had not been formally released at the time of this writing. But one charts social and travel interactions across a traditional north-south ethnic divide, providing insights into how conflict might be averted; another proposes tools for mapping the spread of malaria and detecting disease outbreaks. One corporate lab built a transportation model using cell-phone data to track ridership on 539 buses, 5,000 mini-buses, and 11,000 shared taxis.
Even if the Ivory Coast experiment succeeds, replicating it in other countries may not be easy. Last year the World Economic Forum—the group of leading industry, academic, and political figures who converge annually at Davos, Switzerland—issued a call for governments, development organizations, and companies to develop data analysis tools to improve the lives of people in the poor world. “I shouldn’t have to go to operators and say ‘I’ll do free consulting for you—and in exchange I want to use your data to improve lives,’” Eagle says. “The operators should want to be affiliated with this. Right now many of them don’t see the upside, but if we can get world leaders knocking on their doors saying ‘Let’s do this!’ maybe we can get a lot more done.”
“We can really provide not just insight, but actually something that is actionable. This really does work.”
This will take some careful work to protect privacy and prevent the data from being used in the service of oppression. Orange says it took pains to anonymize its data, but the field needs clear and widely agreed-upon ways to bring the information to market. “There are risks and benefits of having a data-driven society,” Pentland says. “There is a question of who owns the data and who controls it. You can imagine what Muammar Qaddafi would have done with this sort of data. Orange is taking the steps to figure out how to create a data commons that induces greater transparency, accountability, and efficiency—to tell where there are unusual events, extreme events, to tell us where the infrastructure is breaking down. There are all sorts of things we can do with it—but it has to be available.”
As these larger questions play out, Buckee and Eagle are working on refining and augmenting the data-mining tools in Kenya. Eagle aims to use surveys to sharpen and confirm the picture created by mining cell-phone data on a large scale. Call records alone are often not enough, he says; surveying even a few people could allow researchers to weed out erroneous assumptions about what those records show. Once, while analyzing phone data in Rwanda, Eagle noted that people had not moved around very much after a flood. At first, he theorized that many of them were bedridden with cholera. But it turned out that the flood had washed out the roads.
Buckee hopes to mine phone data to target drug-resistant strains of the malaria parasite. These strains, emerging in Cambodia and elsewhere, could reverse progress against the disease if allowed to proliferate, she warns. So she wants to begin merging data on the parasites’ spread into mobility models to help produce targeted disease-fighting strategies. “This is the future of epidemiology,” she says. “If we are to eradicate malaria, this is how we will do it.”
Gain the insight you need on big data at EmTech MIT.