Data Models and Data Reality

Let us start with a quote, which in conjunction with the cartoon right, might tell us something about data models, when compare to actual real-world data, which we possibly need to keep in mind.

All models are wrong. Some models are useful.

In part, the previous discussion of All-Cause Mortality was an initial attempt to consider some of the wider statistical data surrounding the Covid-19 pandemic, which earlier discussions had struggled to explain in terms of a basic SIR model. Therefore, in this discussion, a further attempt will be made to reconcile actual data with the SIR data model, where the population is divided into 3 basic groups, i.e. Susceptible [S], Infected [I] and Recovered [R], where the recovered group also includes deaths. For the purposes of this discussion, the SIR data model will be described in terms of equation [1], which was used in an incremental spreadsheet model to produce the graphs below. As such, we might immediately recognise that these graphs represent data models rather than a data reality.


The equation in [1] defines how the numbers in each group, i.e. susceptible, infected and recovered, change as a function of time [t]. However, it does not explain how the model variables [β] or [ϒ] can depend on the pathology of the virus nor how the initial basic reproduction number [R0] also changes, as a function of time [t], such that it ‘evolves’ to become the effective reproduction number [Re], as outlined in [2].


Again, for the purposes of this general discussion, it will simply be assumed that the infection rate [β] and recovery rate [ϒ] are constants for a specific virus, e.g. Covid-19. If so, the value of [Re] has to be directly proportional to the remaining number of susceptibles [SN] in a given population, as per [1]. In order to visualise how the number of susceptibles [S], infected [I] and recovered [R] might change as a function of time [t], the graphs below start with two different values of [R0], i.e. 5 and 2.5.

Let us simply assume that the value of [R0=5] is generally reflective of the pathology of the Covid-19 virus, where an initially infected person could infect 5 others. As such, the virus could quickly spread through a population and, as modelled, reach an infection peak of 49.4% after 32 days, when [Re=1]. While the use of percentages makes no reference to the actual size of the population, if we were to try to map the [R0=5] chart onto the UK population, where the first infection was reported on 31-Jan-2020, the peak of infections would have occurred around 03-Mar-2020. However, this model date is not supported by actual data, such that it might immediately be questioned. Of course, we should also consider whether the various suppression policies adopted in the UK might be better approximated using a value of [R0=2.5], which might have slowed the spread of the virus, such that it reached a reduced peak of 25% after 72 days, i.e. 12-Apr-2020. Whether this second curve is a better matched to any Covid-19 population may still be questionable, as current statistics only suggest a very small percentage of any population have been infected with the Covid-19 virus, i.e. less than 1%, although this figure also needs to be questioned.

Note: It should be highlighted the progression of the SIR model is dependent on the number of initial infections assumed, which was set to 0.1% of the population in both graphs. This figure could be much higher or lower depending on the size of the population being modelled. As indicated, these models only quantify the number of susceptible [S], infected [I] and recovered [R] as a percentage, such that it might be considered as a compartmental SIR model. However, this discussion will use the term ‘cluster SIR model’ in order to differentiate some of its simplifications.

In the original Covid-19 Pandemic discussion, similar charts attempted to use actual population numbers for different countries. However, it was clear from the outset that the SIR curve could not be matched to the UK population of 67 million, as it required a large percentage to become infected. While the number of recorded Covid-19 infections appeared to contradict the percentage required by a SIR model, it is unclear whether an accurate assessment of Covid-19 infections exists, even today, for any given population. However, let us proceed on the assumption that the model only represents a small ‘cluster’ within a population, e.g. 100 to 1000, such that it may be possible to fit the SIR Covid-19 curve to known statistics. For, in practice, the spread of the virus in a national population may only be explained by the sum total of hundreds of separate cluster SIR models, where each only represents a small community or region within a larger town or city. However, it is unclear whether any SIR model has actually adopted this approach or has used an alternative approach that better fits the data to be discussed.

Note: As repeatedly highlighted, the accuracy of the data in websites, such as, has possibly under-estimated infections and over-estimated deaths – see All-Cause Mortality for more details, which will now be reviewed in terms of UK statistical data.

In part, the chart above is an attempt to verify the claims made by the UK Column news channel, see report dated 7-May-2020. As stated in the news report, the statistical data used to construct the chart above is based on an Office of National Statistics (ONS) weekly XLS report, dated 15-May-2020. It might also be useful for the reader to reference another video entitled The Latest Data and Evidence, dated 28-May-2020. However, it is recognised that the chart above possibly needs some explanation, where the ‘total’ number of deaths, as recorded by the ONS, is shown by the red curve. We might immediately compare this red curve with the solid black curve below, which represents the average weekly deaths for 2015-2019. Near the bottom is another red curve shown as a dashed line, which represents the number of recorded Covid-19 deaths, which as of 15-May-2020, amounted to 35,168. However, while this is an official figure, the accuracy of this number might also be questioned, although, for the moment, the weekly figures used will simply be accepted, such that they might be added to the average black curve to create the curve shown in blue, which then raises a question.

Why is the total red curve greater than the blue curve representing average deaths plus Covid-19 deaths?

One suggestion is that the difference between these curves, i.e. red and blue, represents excess deaths caused by the lockdown policy, after which other health conditions were considered secondary to the Covid-19 pandemic. So, while we might accept the recorded number of UK Covid-19 deaths as accurate, the UK Column news has highlighted a statement by Professor Walter Ricciardi, scientific adviser to Italy’s minister of health, suggesting that only 12% of Italy’s Covid-19 deaths were directly caused by the virus. For in many cases, it is possible that the presence of the virus was not the primary cause of death, such that it might have been better explained by age and multiple other health conditions – see quote below.

“The way in which we code deaths in our country is very generous in the sense that all the people who die in hospitals with the coronavirus are deemed to be dying of the coronavirus. On re-evaluation by the National Institute of Health, only 12% of death certificates have shown a direct causality from coronavirus, while 88% of patients who have died have at least one pre-morbidity, many had two or three.”

Of course, any questioning of the official number of UK Covid-19 deaths would require evidence, which might be presented in the form of the following charts, also based on official ONS data, showing the corresponding weekly deaths by age groups and the percentage of deaths by age groups.

Broadly, these charts align to the earlier analysis of all-cause mortality, which while underlining the vulnerability of older age groups also highlighted the fact that most of the elderly, whose deaths might have been attributed to the Covid-19 virus, undoubtedly had other pre-existing health conditions.

So, what might be inferred from this data?

From the charts, we might estimate that the peak number of UK deaths occurred around week-15, i.e. 6-12 Apr-2020, but with a possible lag between infections and deaths in the order of 3 weeks. If so, the peak of infections may have occurred in week-12, i.e. 16-22 Mar-2020, a week before the UK lockdown was announced. In addition, there is now growing evidence, as previously cited in the video The Latest Data and Evidence, which questions the causal evidence linking any fall in Covid-19 infections to the start of the lockdown policy. The following two examples may be reviewed in detail by following the links to the pdf files, but where the extracts below might be seen as a general summary of the conclusions.

A report entitled ‘Full lockdown policies in Western Europe’, dated 23-Apr-2020, came to the following conclusion: “Comparing the trajectory of the epidemic before and after the lockdown, we find no evidence of any discontinuity in the growth rate, doubling time, and reproduction number trends. Extrapolating pre-lockdown growth rate trends, we provide estimates of the death toll in the absence of any lockdown policies, and show that these strategies might not have saved any life in western Europe. We also show that neighbouring countries applying less restrictive social distancing experience a very similar time evolution of the epidemic”.

Reference might also be made to another report entitled ‘The End of Exponential Growth’ , dated 16-Apr-2020, which is an assessment of the corona virus in Israel since its inception and states: “It turns out that the peak of the virus’ spread has been behind us for about two weeks now, and will probably fade within two more weeks. Our analysis shows that this is a constant pattern across countries. Surprisingly, this pattern is common to countries that have taken a severe lockdown, including an economy paralysis, as well as to countries that implemented a far more lenient policy and have continued in ordinary life. The data indicates that the lockdown policy can be stopped within a few days and replaced by a policy of moderate social distancing.”

So, might we ask why so many politicians decided to adopt the lockdown strategy?

Based on the evidence outlined, it might be argued that questioning the Efficacy of Lockdown is not an unreasonable concern given the statistical data outlined along with the references to scientific papers. In the UK, it might be assumed that politicians wanted to defer the responsibility of the lockdown decision to medical science, such that we should question the basis of this science. Broadly, it is assumed that the UK decision to adopt the lockdown strategy was based on the initial authoritative weight given to a report by a team at Imperial College Cambridge, led by Professor Neil Ferguson. This report was questioned at the time by other experts prior to the UK decision to implement lockdown on 23-Mar-2020, see Covid-19 Pandemic and Propaganda and the Covid-19 Pandemic for more details. However, this report has been more widely criticised, where the following quote, taken from the Mail Online, dated 17-May-2020, is only intended to reflect a growing sentiment:

“Computer code for Professor Ferguson’s model which predicted 500,000 would die from Covid-19 and inspired Britain's 'Stay Home' plan is a 'mess which would get you fired in private industry' say data experts. Scientists have levelled a flurry of criticism against Professor Neil Ferguson's modelling which warned 500,000 people could die from coronavirus and prompted Britain to go into lockdown. Modelling from Imperial College London epidemiologist Professor Ferguson, who stepped down from the government's Sage group at the start of May, has been described as 'totally unreliable' by other experts. The coding that produced the sobering death figures was impossible to read, and therefore cast doubts on its strength. When other scientists have tried to replicate the findings using the same model, they have repeatedly failed to do so. Professor Ferguson's model is understood to have single-handedly triggered a dramatic change in the UK government's handling of the outbreak, as they moved away from herd immunity to a lockdown. Competing research, where models produced vastly different results, was largely discarded.”

Despite this criticism of the ‘medical science’ advocating a lockdown policy, the UK government still appears to be deferring decisions to various sources of medical science expertise, such that the lockdown policy is being maintained, possibly for months to come. However, given the apparent discrediting of the Ferguson model, does the government, let alone the general public, really understand the ‘mmedical science’ on which political decisions are being based? For example, is this medical science still predicated on mathematical models, which appears to have consistently failed to predict the reality of the statistical data as it has slowly emerged.

Note: Some clarification may be needed at this point, as any criticism of the UK lockdown policy is clearly easier given hindsight. However, from the outset, medical science was aware that only older age groups with pre-existing health conditions were statistically at high risk. As such, lockdown in the form of self-isolation might have been a sensible strategy for those in these high-risk groups. However, as per Sweden, the idea of smart social distancing might have been a better policy for all age groups under 50.

Having made this clarification, it might be highlighted that while it is generally assumed that the UK lockdown strategy might have saved lives, the statistics surrounding excess deaths shown in an earlier chart, appears to suggest that it may also have caused more loss of life than it saved. Likewise, as outlined in the All-Cause Mortality discussion, the UK general public has now been bombarded by 24/7 mainstream news and daily government updates for the last 3 months, but despite this saturation coverage, the risk of death associated with the Covid-19 virus has been generally ignored.

But how might the public have been better informed?

By and large, mainstream media and the government appear to have focused on the ‘reported’ number of increasing daily and accumulating infections and deaths being associated with the Covid-19 virus, while knowing that these numbers were questionable to say the least. Likewise, simply presenting these numbers without the perspective of wider all-cause mortality risks may only lead to a distorted perception of the Covid-19 risk. However, might we attempt to gain some better perspective of the Covid-19 risks based on the two charts below. A form of the chart, top, first appeared in early Feb-2020 based on data collected by the Chinese CDC and published in the Chinese Journal of Epidemiology. As such, it has been replicated in hundreds of other sources, but might be assumed to be a reasonably accurate and authoritative source of data.

The chart, top, is apparently based on 72,314 confirmed cases in China, which has been translated into the actual number of deaths, on the left axis, and percentage of deaths, on the right axis, against each age group and as a total of all age groups. As such, we might reasonably assume that the number of Covid-19 infections was an accurate figure as were the recorded deaths in each age group. Of course, as has been highlighted against other death statistics, it is not always clear whether the Covid-19 virus was simply present at death, one of a number of causal factors or the primary factor. However, while it is not really possible to be definitive on this issue, we might ask whether we see any parallels in the distributions of deaths by age in the lower chart associated with all-cause mortality. While speculative, is it possible that the distribution of Covid-19 deaths might actually correspond to the normal distributions of all-cause mortality? If so, we might consider a simplification of an earlier chart, which removes excess deaths from the general assessment, such that we might perceive a more realistic assessment of the impact of the Covid-19 virus, i.e. the dashed red curve, on the averaged all-cause mortality rate, i.e. the black curve, which results in the blue curve.

What might this chart tell us about the Covid-19 virus?

Let us assume that the average black curve is generally representative of seasonal variations of all-cause mortality in the UK, where the year-to-date, i.e. 15-Mar-2020, figure would correspond to 226,754 deaths. However, we might then recognise that any increase over and above this figure might reasonably be associated with the Covid-19 virus, while ignoring the complication of excess deaths. If so, the increase in all-cause deaths would amount to 261,922 deaths year-to-date, such that the estimate of 35,168 Covid-19 related deaths would not be unreasonable. While this would correspond to a 15.5% increase over the year to date figure, it might be highlighted that over 97% of the deaths, based on the normal all-cause mortality distribution would be in the over 60 age groups, where most may have other pre-existing health conditions that made then more vulnerable to the virus. It might also be highlighted that if the Covid-19 deaths has peaked, then the total for 2020 might be estimated to be in the region of 50,000, which if compared with last year’s all-cause mortality figure of 637,057 deaths would correspond to a 7.85% increase over the year.

Note: The use of statistics is not intended to diminish the tragedy of each death, but simply to keep this tragedy in perspective to what happens every single year without need for the crippling implication of lockdown on the lives of the general public, both socially and economically. In this statistical perspective, we need to recognise that the cited 637,057 averaged all-cause mortalities would translate to 0.94% of the UK 67 million population. If we then extrapolate the UK figure of 35,168 Covid-19 deaths to a total 2020 figure of 50,000 deaths, this would equate to an increase from 0.94% to 1.04% of the UK population, i.e. 0.1% increase.

So, what, if anything, might be concluded at this point?

While there are undoubtedly some very sophisticated SIR models, which this discussion has not considered, the fact that the model used by Professor Ferguson to advise the UK government has been broadly discredited brings into question the accuracy of any of them. While the idea of a ‘cluster SIR model’ as outlined at the start of this discussion has no weight of authority, it may at least highlight some of the problems with all SIR models. Basically, any national population is not homogeneous in its demographic makeup or population density, such that the spread of any infection may vary enormously for all sorts of reasons, which we might assume is very difficult to model accurately. However, if we were to assume that the Covid-19 virus has a natural value of [R0=5], then this might apply to any smaller cluster population with a high-percentage of susceptibles, as per equation [2]. As such, we might recognise that the number of susceptibles may only be reduced by exposure to the virus, via herd immunity, which a vaccine might help accelerate. However, the wider issues surrounding vaccinations will be deferred to another discussion. Of course, the natural value of [R0] might also be reduced to a lower effective value of [Re], if the spread of infections is minimised by some form of self-isolation combined with social distancing.

Note: In the context of this discussion, the idea of ‘lockdown’ has never been considered against a ‘do-nothing’ policy. Rather it was highlighted that the risk of death due to the Covid-19 virus has been proved to be correlated with age, especially when combined with other pre-existing health conditions. If so, those with the highest risk, e.g. over 60, might be best advised to self-isolate, while those below 60 with no obvious health conditions might simply observe more sensible social distancing advice, especially in high-density urban populations. However, as highlighted, while there is statistical risk to all age groups, it is something that needs to be put into perspective.

Today, 30-May-2020, ‘experts’ have warned the UK government not to relax the lockdown ‘too soon’ as they perceive the need for the effective reproductive number [Re] to fall even lower. Estimates, published 11-May-2020, have put [Re] somewhere between [0.5] and [0.9]. On 14-May-2020, analysis by Public Health England and Cambridge University calculated that that [Re] had fallen to [0.4] in London, with the number of new cases halving every 3.5 days. However, a day later, the Government Office for Science revised the [Re] number to be in the range [0.7] to [1.0]. Of course, what is not clear is how any of these estimates were determined, e.g. what assumptions were made about the percentage of susceptible in any of these populations. While it might be assumed that the SAGE experts still advising the government may remain convinced of the efficacy of the lockdown policy, there is mounting statistical evidence that now questions many of its assumptions.

But how might the general public judge these apparent contradictions?

First, it has to be recognised that the general public may have gained a very distorted perspective of the risks associated with the Covid-19 virus, which is now reflected in the fact that many are fearful of relaxing the lockdown too soon. While some may also argue, on moral grounds, that it was the duty of government to save lives, irrespective of the cost, the public possibly understands that this may also be used as political rhetoric, which cannot be the basis of sensible public policy. Of course, this does not mean that it cannot be a goal of a civilised society to save lives, whenever and wherever possible. If so, we finally come to the thorny question, which may only be answered when viewed from a future perspective

How many lives did lockdown save in the UK?

While, at this time, we cannot put a figure to this question, we might still make some assessment of the lives not saved. For example, we know from official ONS statistics that the 36,000 people reported to have died from the Covid-19 virus were not saved by the lockdown strategy. Unfortunately, we now know that a great many of the lives lost, possibly unnecessarily, were in care homes for the elderly. There is also a suggestion of 36,000 excess deaths that were possibly caused by the lockdown strategy in an attempt to protect the health services being overwhelmed. In fact, none of the 146,220 people who have died, according to ONS statistics between 20-March and 15-May, which includes normal all-cause mortalities, Covid-19 and excess deaths were saved by the lockdown, which started 23-March. Finally, we also have to consider the efficacy of the lockdown policy, when applied to all age groups, if the risk of death from the Covid-19 virus for the under 50 groups amounts to only 1%, such that 99% of all Covid-19 deaths are associated with the over 50 age groups, especially with other health issues. However, to repeat, this is not a discussion about ‘lockdown’ versus ‘do-nothing’, but rather a rationalisation of the risks, which not only includes the risk of death, but the risk of social and economic hardship, which were potentially unnecessary for the younger age groups.