Interpreting the Covid-19 data:Challenges and Issues
Governments and organisations around the world are using data about Covid-19 testing and cases to guide strategic responses to the crisis and keep the public informed about the spread of the disease and the impact of government interventions, public health measures and medical treatment. But to what extent is the data being misunderstood, misinterpreted or misreported? Here are some common issues and challenges faced by anyone working with the available data. We have all seen charts comparing the number of cases and deaths in each country, often used to explain and interpret the degree of outbreak in specific countries and the success or failure of their interventions. However, there is a huge challenge in comparing cross-country data. Take Covid-19 testing as an example: Every country has different criteria for who gets tested. In some countries only the really serious cases are tested, while others are also testing milder cases or those considered ”at risk.” Accuracy of testing also differs and naturally countries are sourcing te
sting kits from different suppliers. Different testing regimes are likely to be a key contributor to the hugely differing mortality rates (deaths/cases) we see between countries. Countries that are testing a larger proportion of the population naturally detect mild cases which will lead to lower official Covid-19 mortality rates. We now know that some carriers are asymptomatic, therefore the only way to get an accurate mortality rate is to test a random sample of the population and then track those cases until all have recovered (or sadly died). Random sampling also gives the incidence of the virus at any one time, which can be useful in determining the “exit strategy” for those countries under lockdown. But if we can’t rely on the number of cases, surely the number of deaths reported in each country can be compared? Unfortunately, the way in which deaths are recorded also differs. Some countries are recording deaths just from hospitals, others are including deaths in care homes or other facilities. Some countries record a death in their official Covid-19 statistics if the virus was present at death (and tested for), while other countries may include deaths where Covid-19 was not tested but assumed because of the symptoms. We also don’t know if Covid-19 was the main cause of death. This is increasingly becoming a vital factor when assessing the effectiveness of social distancing – comparing the lives saved through reducing the virus transmission to the lives lost through the economic and social hardship caused by the interventions. So how can we use the available virus transmission and fatality data? If we accept that comparing countries is problematic, we can still use country-level data to understand the overall trend. The absolute numbers may be inaccurate (it is widely acknowledged that the total number of Covid-19 cases in all countries is under-reported). But assuming the testing criteria stay the same in a particular country, understanding the overall trend is very useful to see if we are “flattening the curve”. And it is still useful to compare the shape of the curves in each country if we accept that the absolute numbers represent varying degrees of inaccuracy. A key question asked everywhere is: When will we know why some countries have been harder hit than others and more importantly, which interventions have been more effective? Some of the world’s best data scientists and modelers are working on this very question. It is too early to reach firm conclusions about what has worked well in specific countries as there are potentially hundreds of factors that must be considered. That hasn’t stopped various theories being published – from the more probable ones (for example, the propensity of mask wearing) to more spurious suggestions (for example, that the level and control of the outbreak is determined by the gender of a country’s leader). We’ve also seen some broader conspiracy theories emerge – from 5G technologies being responsible for coronavirus to the idea that the virus was created in a lab. Closer examination of the origin of such ideas can reveal the (un)truth of these claims, and the media can usefully be active in dispelling these myths, particularly when they fuel racism and xenophobia. In the absence of clear scientific proof, it is too early to judge the various theories – there may be a clear correlation showing countries with a high incidence of mask-wearing and low incidence of Covid-19, but there could also be other characteristics about these countries that better explain the differences in speed and severity of the incidence of Covid-19 (e.g. previous history of similar respiratory illnesses in the region or cultural differences). As a data and insights professional, I am naturally pleased to see the role that data and evidence-based decision-making is playing in minimizing the impact of Covid-19 worldwide. Such data is vital for decision-makers and will play a huge role in creating “exit strategies” for governments to determine when to lift their lockdowns and other measures. However, it is also open to misuse, often sadly to support political or personal agendas or biases. For those analyzing and reporting the data, I would suggest: 1. As always, examine closely the source of the analysis and opinions we are reading. To anything that claims to give clear answers, we need to bring a healthy skepticism. Find out more about how the data is collected within each country in order to make useful comparisons. For example, what criteria need to be satisfied for an individual to be tested and what qualifies as a “Covid-19 death”? 2. When analyzing daily cases, it is useful to put this in the context of testing. Any change in testing levels will see a corresponding change in the numbers of cases. And of course, in comparing countries we need to take into account population. 3. While the absolute numbers (especially deaths) are still important, it is essential to pay as much attention to data trends. The virus unchecked grows exponentially and it is the shape of the curve and its current position in that curve that should be a key aspect of any reporting. With some of the world’s best data scientists and universities modelling the data, it is likely that many of the answers we need to move forward will eventually be found within the data.