For now, Cáceres’s group is looking to collaborate rather than commercialize. But cell phone networks are thinking about monetizing their data, says Jean Bolot, a researcher at network operator Sprint. This means a “two-sided” business model where they not only serve end users but also make money through relationships with other businesses. “This is new in the telco space but not in other areas–look at Google, for example,” he says.
Since almost everyone has a cell phone, the scale of the data is immense compared to other sources. Mobility patterns might, for example, be used to adjust property or billboard advertising prices. “Just about every operator on the planet is probably thinking about this right now,” says Bolot.
Another study, presented by Francesco Calabrese, a research scientist at MIT, and colleagues correlated location traces from roughly a million cell phones in greater Boston with listings of public events such as baseball games and plays, showing how people traveled to attend these events. “We could partly predict where people will come from for future events,” the team wrote in a report on their work, suggesting it could be possible to provide accurate traffic forecasts for special events.
The surge of research in this area has been enabled by the development of algorithms that can efficiently handle large networks consisting of millions of links, says Vincent Blondel, a professor of applied mathematics at Université Catholique de Louvain, near Brussels, who organized the Cambridge workshop.
Blondel’s research includes an analysis of connections between two million cell phone users in Belgium. It revealed that the French-speaking and Dutch-speaking populations of the country are barely connected by calls and texts. “This is interesting, since there are already discussions within Belgium about splitting the country in two,” says Blondel.
Research in this area is typically focused on aggregate information and not individuals, but questions remain about how to protect user privacy, Blondel says. It is standard to remove the names and numbers from a CDR, but correlating locations and call timings with other databases could help identify individuals, he says. In the MIT study, for example, the team could infer the approximate home location of users by assuming it to be where a handset was most located between 10 p.m. and 7a.m., although they also lumped people together into groups by zip code.
“I feel the scientific community should take responsibility for finding out how to trade off having useful data and protecting privacy,” says Blondel. He is investigating the effect of techniques like using approximate rather than exact location information, or blurring the exact time stamps of calls from a data set.