Are Chinese data on new coronavirus cases dodgy?


In brief:

  • Everyday, China reports new coronavirus cases – though many doubt the veracity of their reporting.
  • Chinese data do, however, pass one check for dodgy data: Benford’s law
  • This doesn’t *prove* China data are accurate. Measuring an epidemic is complicated. But we doubted their data would pass this test, and its good practice to report results which contradict your starting theory.

Each day, China reports new Coronavirus cases – though many doubt the veracity of their reporting. So in this article we investigate whether their numbers pass one check for dodgy data (Benford’s Law).

Benford’s Law states that the numeral 1 will be the leading digit in a genuine dataset 30.1% of the time. The numeral 2 will lead 17.6% of the time. Numerals 3 through 9 will also lead a certain proportion of the time, though less and less often.

China’s data on new, daily cases of Coronavirus (by region) almost perfectly match the Benford pattern. As do river lengths, country sizes and other things. It’s curious that data often follow this pattern, and helpful for detecting fraud. It has highlighted discrepancies in markets as diverse as LIBOR interest rates and fish prices.


We’re not saying this *proves* China data are accurate. Measuring an epidemic is, no doubt, complicated (and political). But we doubted their data would pass this test, and it’s good practise to report results which contradict your starting theory.