Coronavirus charts - your questions answered

Coronavirus charts - your questions answered

Each day I’ve been updating a chart showing the UK’s trajectory through COVID-19. Since each time I do so there are frequent questions- why are you choosing this measure? why are you using that odd axis? etc etc – I figure rather than repeating them daily on Twitter it probably makes more sense to put them in a blog which I’ll try to keep updated for as long as the charts are going out.

Why are you looking at deaths numbers?

Initially the best way of comparing those trajectories was by looking at case data – how many people were testing positive for COVID-19. However soon enough it became clear that these numbers were as much a reflection of how much testing was going on as how widespread the disease was. From the vantage point of case numbers, for instance, the UK would look to have far less severe an outbreak than most other European countries – yet we know from other metrics that this is not the case.

So the best way of gauging the spread of the disease, and the extent to which various countries have a hold on it, is the number of deaths. Which is why I have tended, when looking at the data each day, to spend most time focusing on this chart:

While there are plenty of problems with the data (which we’ll get to), there is also a lot we can learn from the numbers. In the early days, for instance, the UK followed Italy’s path very closely (look at the red line vs the light blue one). A lot of people assumed it was silly to compare the two countries: surely, they said, there was no way Britain could follow in the same direction! The Italian demographics were different; the nature of the health system was different and so on. But look at what happened since: it took precisely the same time for the UK and Italy to move from having 100 deaths to having 10,000. The UK has followed the Italian, and for that matter the French line, very closely throughout the first few weeks of the disease. There may be a divergence in the coming days but if anything the surprising thing has been the lack of divergence in recent weeks – at least for the UK. Germany, on the other hand, began with deaths rising in line with Spain, Italy and the UK but has subsequently flattened out.

Here’s an alternative way of looking at the same charts, this time not looking at the total cumulative numbers but daily average numbers from the past week:

Do these charts include all UK deaths?

No. They are the death numbers announced by the Department for Health and Social Care each day and they are, by definition, incomplete. They only include deaths in hospitals, for one thing. Nor are they a reflection of the number of deaths in any given 24 hour period, or for that matter the hospital death toll as of 5pm the day before they are announced – because they are the deaths that could be processed and announced in that period.

We know as much because the NHS produces its own dataset of COVID-19 deaths based on when those deaths occurred and it is a slightly different line to the one we get each day. As you can see from this chart, it began earlier than those deaths were announced (the first COVID-19 death happened in late February but wasn’t announced until a good few days into March, for instance).

The upshot is that we are looking through the rear-view mirror: in the early stages when the death toll is rising, the announcements will understate the total. In much the same way, they may overstate the increases in deaths when the totals are falling. We will not know the peak until it has passed.

Then there’s the other way in which the numbers do not tell the full story: they do not include deaths in care homes or indeed in people’s homes. That implies they are understating the total. By how much is a question we won’t have the answer to for some time because the most comprehensive measure of UK death numbers gets published by the Office for National Statistics some weeks later.

So as you can see from the chart here there’s a gap between the numbers produced by the ONS and the lines produced by the NHS. At the moment that gap is not very big. In France the official numbers suggest a major chunk of COVID-19 deaths are happening outside hospitals whereas the ONS data we have thus far suggests it is less than one in ten of those in England and Wales. But this is a fast moving event and given there is a nationwide lockdown it is quite possible many of those deaths simply haven’t been registered yet. So it will take some time before we have a more accurate count of the death toll.

And this is before you get to the question of whether what we are currently calling a “COVID-19 death” is really due to COVID-19 or was really down to other causes.

All of which is why I’ll be keeping a close eye on the numbers produced by the ONS which reflect all UK deaths. When I first wrote this blog they suggested England and Wales had slightly more deaths than normal for the time of year. But since then there’s been a significant leap in the numbers, as you can see below. It is now pretty clear that Britain is in the midst of a major mortality crisis – even when you ignore the stated cause of death.

Do any countries include non-hospital deaths in their totals on the big chart?

Yes, at least two of the major countries in the data series – France and Belgium – also include non-hospital deaths so their lines are, on a truly comparable basis, lower than they look in these charts. Some have chosen to remove those lines and replace them with adjusted ones. That is a reasonable editorial judgement, but it also crosses a kind of dataviz rubicon. For these inconsistencies about categorisations of death are not the only problems with the dataset: there are also inconsistencies in the way those deaths are apportioned to dates and to what different countries count as a COVID-19 death. The minute you start making one judgement call and fiddling around, where do you end?

Perhaps that’s a cop-out, but it does at least have the benefit of consistency. I stick with the data as it comes to us from the ECDC and when producing up to the minute charts for Sky News I then add the latest UK number from the DHSC (which then gets added to the ECDC database itself the following day.

Can we believe the Chinese data? Why are you including it?

There are big question marks over the Chinese data. Especially in the early stages of the disease the Chinese authorities suppressed information about the scale of the spread. Many also believe it continues to understate the scale of the outbreak in China. There are similar suspicions about Iran’s data. While some think we should simply ignore the data altogether, my philosophy is to include those data series but to make very clear the provisos about them. Let’s see what further information arises about them.

Why are you using weird logarithmic axes? Don’t they understate the severity of the problem?

This partly goes back to the first point. These diseases spread exponentially. And with exponential trends, very small differences in growth rates can cause serious differences in outcomes. A death rate doubling once every three days, as Italy’s and the UK’s were for some time, means within a few weeks you have a horrendous death toll. Whereas a death toll doubling every week is, while horrible, far less terrifying.

In other words, we need to keep a close eye on the growth rate in the early stages of the disease. A normal axis doesn’t really show that. Small numbers look small; big ones look big. It is not altogether clear from the lines here how or why the German and Italian lines diverged.

A log chart, which scrunches up the y axis, adjusting for exponential increases, allows you to see those growth rates in the early stages as well as later on. Here’s a video that explains more:

Why doesn’t your chart compare countries by population size?

The short answer is because for the most part population size isn’t a very big determinant of the growth of the disease in its early stages. That changes as the disease spreads and becomes more widespread, but remember, though this story might feel old, it is still very much in its early stages.

However since this is one of the things I get asked most frequently, I’ve put together a population-adjusted chart here:

As you can see, the broad shape of this chart is much the same as the other one. Spain, Italy, the UK and France heading in much the same direction. Some countries, for instance South Korea and Germany, managing to break free of the pattern. But there are also some important differences.

The US, for instance, is no longer at the very top of the chart. Because of the size of its population it is considerably lower down – though it’s worth noting it’s still heading in a similar direction as many of the European countries with severe outbreaks.

Moreover, some countries which look like they are faring well with the outbreak – for instance Sweden and the Netherlands – seem to be doing as badly as many of the worst-affected countries in Europe.

As I say, this is intended as a post I’ll constantly return to so if you have any questions please do tweet me and I’ll try to answer them in the coming days/weeks.