In 1938, the physicist Frank Benford made an extraordinary discovery about numbers. He found that in many lists of numbers drawn from real data, the leading digit is far more likely to be a 1 than a 9. In fact, the distribution of first digits follows a logarithmic law. So the first digit is likely to be 1 about 30 per cent of time while the number 9 appears only five per cent of the time.
That’s an unsettling and counterintuitive discovery. Why aren’t numbers evenly distributed in such lists? One answer is that if numbers have this type of distribution then it must be scale invariant. So switching a data set measured in inches to one measured in centimetres should not change the distribution. If that’s the case, then the only form such a distribution can take is logarithmic.
But while this is a powerful argument, it does nothing to explan the existence of the distribution in the first place.
Then there is the fact that Benford Law seems to apply only to certain types of data. Physicists have found that it crops up in an amazing variety of data sets. Here are just a few: the areas of lakes, the lengths of rivers, the physical constants, stock market indices, file sizes in a personal computer and so on.
However, there are many data sets that do not follow Benford’s law, such as lottery and telephone numbers.
What’s the difference between these data sets that makes Benford’s law apply or not? It’s hard to escape the feeling that something deeper must be going on.
Today, Lijing Shao and Bo-Qiang Ma at Peking University in China provide a new insight into the nature of Benford’s law. They examine how Benford’s law applies to three kinds of statistical distributions widely used in physics.
These are: the Boltzmann-Gibbs distribution which is a probability measure used to describe the distribution of the states of a system; the Fermi-Dirac distribution which is a measure of the energies of single particles that obey the Pauli exclusion principle (ie fermions); and finally the Bose-Einstein distribution, a measure of the energies of single particles that do not obey the Pauli exclusion principle (ie bosons).
Lijing and Bo-Qiang say that the Boltzmann-Gibbs and Fermi-Dirac distributions distributions both fluctuate in a periodic manner around the Benford distribution with respect to the temperature of the system. The Bose Einstein distribution, on the other hand, conforms to benford’s Law exactly whatever the temperature is.
What to make of this discovery? Lijing and Bo-Qiang say that logarithmic distributions are a general feature of statistical physics and so “might be a more fundamental principle behind the complexity of the nature”.
That’s an intriguing idea. Could it be that Benford’s law hints at some kind underlying theory that governs the nature of many physical systems? Perhaps.
But what then of data sets that do not conform to Benford’s law? Any decent explanation will need to explain why some data sets follow the law and others don’t and it seems that Lijing and Bo-Qiang are as far as ever from this.
Ref: arxiv.org/abs/1005.0660: The Significant Digit Law In Statistical Physics