Pitfalls Emerge In The Analysis of Mobile Phone Datasets

Mobile phone data is revolutionising the way researchers study human mobility. But these analyses are worryingly susceptible to hidden bias, say researchers.

Emerging Technology from the arXivarchive page

September 2, 2014

Mobile phones have become one of the most important tools for anthropologists hoping to uncover the secrets of modern human behaviour. Every mobile phone call or text triggers the creation of a log that records the time and day, duration and type of communication of the message as well as the cellular tower that handled the call.

This data from millions of people over months and years from all over the world provides the most detailed account of human behaviour ever assembled. In particular, researchers are using this data to study the way entire populations of humans move on a scale ranging from a few kilometres to thousands of kilometres. It shows when people work, how they commute and migrate, how infectious diseases spread, and even how populations respond to armed conflict and natural disasters.

But there is a problem. The huge interest has created a kind of data gold rush with researchers analysing the data in different ways. That makes it hard to compare studies in different countries. Even worse, these analyses have failed to take into account some important limitations of mobile phone data that can severely bias the outcomes.

Today, Nathalie Williams at the University of Washington in Seattle and a few pals outline the problems associated with mobile phone data and say that many of these problems can be solved by using a new definition of mobility. They demonstrate the advantages of their new approach using a dataset of mobile phone calls from the African country of Rwanda made between 1 June 2005 and 31 January 2009.

A mobile phone network is made up of a number of cell towers that can cover an area of a few hundred metres to up to 10 kilometres. When a mobile phone makes a call, it connects to its nearest cell tower from where the data about the call is logged. Should a person then move into the catchment area of another tower, then any call from there will be logged at the new location. Studying the different locations that people make calls from gives an indication of how mobile they are.

One important factor in this debate is how mobility is defined. Anthropologists think of it as having two dimensions. The first is how frequently people move, in other words the number of times a person goes anywhere. The second dimension is distance or how far a person moves. Both of these dimensions must be captured by any meaningful measure of mobility.

Anthropologist studying mobile phone data have used a number of different measures of mobility, such as the number of towers used by an individual or the straight line distance between them. The most common is the radius of gyration which is determined by finding the geographical centre of all the towers used and then taking the mean of the squares of all the distances from this centre to the towers.

The problems this raises are sometimes obvious and sometimes subtle. One obvious problem is the uneven distribution of towers around a country. Typically, most networks have a high density of towers in cities and a much lower density in rural locations.

That immediately causes a problem. Williams and co consider the example of Rwanda and its capital Kigali, which has 50 towers within a five kilometre radius. An individual living in Kigali could regularly move only within this five kilometre disc and use numerous cell towers in the process. This person’s mobility would then be deemed high, given the number of cell towers they use.

But consider another individual living in a rural area with only one tower covering a five kilometre radius. That person may move just as much as the city dweller but only ever use a single tower. By the same measure, their mobility would be low.

The problem gets worse. “This issue is further exacerbated by the fact that cellular towers are placed more often in urban areas with high population density, politically important areas, such as capital cities, or wealthy areas with higher mobile phone penetration,” say Williams and co. That introduces additional bias.

Another problem is that the number of towers changes with time. As a mobile network expands, the operator adds new towers, sometimes in regions that have no coverage but also to divide up areas with existing coverage. “Because existing measures use towers as their spatial reference points, this causes a situation of spatial and temporal bias in these measures,” they say.

Other difficulties come about because of the way people use phones. The more often a person calls, the more towers at which he or she will be registered. “A person who uses their phone frequently will likely have a different mobility rating, compared to a person with the same spatiotemporal trajectory but lower calling frequency,” say Williams and co.

The team say that many of these problems can be solved or at least mitigated by using an entirely different measure of mobility. Their idea is to do away with the location of the towers as the main reference points and instead use a system of grid cells placed across the country.

For Rwanda, the team chose to work with 2040 grid cells, each measuring five kilometres by five kilometres. Of course, some of these grid cells will have one active tower, some will have several and some will have none (in which case they can be treated like any other form of missing data).

If someone makes a call from any of the towers within a grid square, they are registered as being located at the centre of that cell. If they make another call from another tower within the same square, the location remains the same. But if their next call is handled by a tower in the next square, they are deemed to have moved.

That immediately solves one of the most significant problems. “The problem of spatial variation in tower density is eliminated because grid cells are of even size and non-overlapping,” say Williams and co.

Williams and co make a number of other suggestions to improve the way this data is handled, such as assuming people travel along established road systems between one location and another rather than in straight lines.

The end result is that the team have a much clearer handle on both the spatial range of people’s mobility and also the frequency of the journeys they make. What is more, these measures are applicable to a wide range of mobile phone databases from countries all over the world. All of a sudden, the analyses made in one place can be immediately and easily compared to analyses made at another.

That’s certainly a worthy goal and Williams and co go further. “These new measures open up entirely new avenues of research,” they say. Being able to compare data from different places should allow researchers to investigate questions such as how population mobility influences individual migration, tuberculosis infection or the participation of women in the workforce. “Population level mobility can also be related to population-level characteristics, such as HIV prevalence rates, birth rates, social norms, economic well-being, or political participation,” they add.

More ambitious is the idea that people’s patterns of mobility change during emergency events, it may even be possible to pinpoint an earthquake or a bomb blast in real-time using this kind of data.

It certainly makes sense that these kind of analyses should be comparable. Williams and co certainly thinks so and the only question now is whether everybody else involved in analysing mobile phone datasets agrees.

Ref: arxiv.org/abs/1408.5420 : Measures Of Human Mobility Using Mobile Phone Records Enhanced With GIS Data

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.