The way information percolates through networks is of ongoing fascination to increasingly diverse groups of people.
That’s why physicists have developed a new science of networks to better understand what’s going on. The models they have developed can accurately describe how disease spreads through society, how gossip spreads through social networks and how malware spreads over the internet.
In other words, given the location of the source and the structure of the network, they can accurately predict how information will spread.
However, the reverse task of determining the source of information after it has already spread is much harder. That’s because most interesting networks are so big that it’s impossible to measure the state of every node.
Today, Pedro Pinto and pals at the École Polytechnique Fédérale de Lausanne in Switzerland show that it can be done, even if you have information from only a few nodes. “We show that it is fundamentally possible to estimate the location of the source from measurements collected by sparsely-placed observers,” they say.
The start with a theoretical description of the problem and how it can be solved. They go on to demonstrate the effectiveness their method using data about a cholera outbreak in the KwaZulu-Natal province in South Africa in 2000. This data includes a detailed map of the network of waterways in this area through which the disease would have spread, and the number of cholera victims in various communities in the network.
“By monitoring only 20% of the communities, we achieve an average error of less than 4 hops between the estimated source and the ﬁrst infected community”, they say.
That’s an impressive result. However, it comes with a number of caveats. One problem is that the method assumes a good understanding of the structure of the network, something that is not always easy to get for large, real-world networks.
In the case of cholera, for example, the disease spreads downstream through rivers, which can be mapped reasonably accurately. But it also spreads when infected victims move from one geographical location to another and this is much harder to to take into account.
Another problem is that nodes can have different levels of importance in a network so the choice of the ones used to sample the data is important. However, nobody knows what the optimal choice should be.
Nevertheless, the new approach should have wide application. The same technique that can spot the first victim in a cholera outbreak should also work with other network-based phenomena such as the spread of a computer virus or news or malicious gossip.
That’s something that more than few people ought to be interested in.
Ref: arxiv.org/abs/1208.2534: Locating the Source of Diffusion in Large-Scale Networks