Statistics telling Stories

Sunday 11 October 2020

This is a title I have used for a number of workshops I have run and is the way I approach much of the applications course. Statistics and the way we choose to present them tell stories and the story they tell depends on so many of the decisions we make in doing so. The much quoted 'Lies, damn lies and statistics' is a bit tired for me. There is so much more subtelty to consider and whilst I know there there is an element of willful misrepresentation to consider, I prefer to focus on the honest and the easily over looked. There must be countless examples I could draw on, but, in the current climate I find myself drawn to the virus statistics that are dominating our news. I hesitated before posting this blog, since I know that the virus has caused a truly horrific number of deaths as well as adversely affecting the lives of many more. It will be personal for many. It can be insensitive to use it as a case study, especially since it it is ongoing. That said, I suspect it is the stats people are paying attention to more than anything now. So let me just disclaim that I do not take any of this lightly.

Of course, information sources are the number one hot debate here. How do we know that any one source has the monopoly in correctness? I suspect we don't and that we can only do our best to be read up on such matters and be aware of possible limitations. With things on such a scale it is hard to imagine that daily figures are accurate and entirely possible that some sources have some big holes and problems. I don't profess to have found the answer. I have been following data journalist John Burn Murdoch form the financial times through out. Although certainly related to being drawn to British press (I am British after all), I have appreciated the attempts he has made to clarify, explain, cite and cover the pandemic from a global perspective. He has made videos, written blogs and tweeted prolifically about the work he has done to find sources, collect and present the data. This has resulted in a lot of writing (for hom) and reading (for me) but mostly I return to this tracker that he has built. It is updated daily and has lots of options. I wanted to share a few example screen shots from graphs I have generated, but you will surely want to create your own. The search I find myself doing most regularly is France (where we live), UK (where I am from and have friends and family), Netherlands (Where my wife is from, my eldest daughter is studying, and we have friends and family) and Sweden (Where the approach has been quite different from other European countries. Again though, this is just an example and you can change the countries yourselves.

Daily Deaths

This is where I tried hard to remind myself that these are people and loved ones before they are statistics. It is easy to overlook this when consumed by looking for patterns in statistics. Have a look at these 2 graphs about the daily death tolls in these countries.

These two graphs show exactly the same data. With a quick look, can you tell what the difference is? Do you think they tell the same story?

What did you find?

It seems to me that they tell a quite different story and neither of them are wrong. As an audience of maths teachers, I will imagine that you have all spotted 1) that the top graph employs a logarithmic scale whilst the second one uses a linear scale and that 2) these are raw numbers. The log scale is a useful and completely legitimate tool that allows us to get more data on the same graph  - for example, see what happens when I include the US in the linear graph (left) as opposed to logarithmic (right)

The impact on the linear graph is to squash the others to the point where it is difficult to see the variation. In this case the logarithmic scale offers us more, but in the first case, I would argue that the linear scale is better. This is a pretty big subtlety to expect people who are watching the news or reading the paper to pay attention to. As such, the choice is a key one. For me this puts both a weight on journalists to explain and consumers to inquire. For me, this would be one of the key goals of teaching stats on this course.

The second point is also key. The tracker also allows you to change the data to 'per million' and therefore make it proportional to the population of the country. Here they are - The top one is the log scale with the linear scale below.

Here the differences in the story are more subtle, but still important. really though, they need comparing to the ones above. Here is is one a linear scale with the US included too....

It is fairly clear that this tells a very different story of the US. I imagine the viewer is first drawn to the much lower peak for the US and possibly less aware of the significance of the fact that it remains higher, like Sweden's for longer periods. This is harder to deal with or quantify. How many will think about the 'area under the curve' I wonder? To understand that, we might want to look at cumulative deaths. Here are 2 that show this...

Linear, per million

Logarithmic, per million

And so these tell us yet another different story..... and this is without considering so many other issues. The tracker also allows you to look at daily cases and well, this is even trickier because of the need to factor in testing rates and so on. Have a look.

I suppose the point I was keen to make is that this particularly issue is charged with so many important subtleties that warrant all these options, but that, invariably, when a story is being told, we will only see one of the pictures, when we really need to see them all to gain a greater, more meaningful understanding. In the busy, minute media lifestyles that we have, who knows how, who takes the time, to focus on the important subtle details that help us tell whole stories?

I must now add huge thanks to John Burn Murdoch for his extraordinary efforts and the images included in this blog. I would encourage you to visit the tracker and do some of your own research. There is an incredible amount of information here and lots of rich material to help educate our students. I am only sad that it has had to be so tragic for so many people. At least we have the opportunity to do our best to understand and help others understand.