Can we do something as a Community to fight COVID-19 (aka Coronavirus)?

Prediction of criticality in patients with severe Covid-19 infection using three clinical features: a machine learning-based prognostic model with clinical data in Wuhan

link to article: https://www.medrxiv.org/content/10.1101/2020.02.27.20028027v2

Dear fellow data scientist, if you are interested in contributing to the research of COVID-19, I would like to invite you to our non-profit working group, based in Milan, in the epicenter of the European outbreak of the coronavirus. We’ll publish datasets and open a discussion platform to help each other and the scientific community to find novel ways of defeating this epidemic. Join us at defeatcovid19.org !

Been tracking and charting numbers on my own website: http://covid19.rumments.com/

The big problem is not so much Covid itself given its numbers, but the panic and reaction
If you take away the three large countries, one of which has plateaued (china), the numbers are actually quit small, even less so when you compare them to prior epidemics and outbreaks (1918, 1957, 2009, etc).

If you place these numbers in terms of population, things get more interesting…
China - 0.005626% infection rate (that’s 1 in 19,011) - 3.94% mortality rate
Italy - 0.029209% infection rate (1 in every 3,424) - 7.18% mortality rate
Iran - 0.01353 infection rate (1 in 7,390) - 4.52% mortality rate

after those three, which represent a huge lions share of the cases
USA - 0.000007% infection rate (1 in 152,255) - 2.16% mortality

The asymptomatic cases which are not in the numbers and would bring numbers of all countries down…
They wont be included at all till there is way more testing…

right now the normal yearly flu in the US has claimed many more lives, near 20,000

So its very hard to get a grasp of the numbers…
the H1N1 struck in 1918, and 2009…
we can write off the numbers of 1918, as this would have to suddenly jump 1000 fold to even start being like that…

In 2009, without the kind of response that is going on today (at least one politician calling for nationalization) the total deaths in the us came to 3,433

One of the problems we have with modeling and AI prediction is needing model data that gets associated.
In this case, how could we do that given that the world response compared to the past is disproportionate to prior events? what would we model or predict? certainly we could not know about toilet paper sales? stock collapses? City leaders shutting down businesses by order?

Cures? there are several about to be tested… and not just Celgene’s

As of this moment there are 7,079 dead worldwide

H1N1 in 2009 hit a confirmed 18,000 worldwide, with guesses being now thrown around as high as 203,000…

To match that, Covid current death toll would have to rise 2,868% (by December)

With this in mind, what would we try to predict that would have some basis?

That’s a very tall order from systems that have to learn from the past in some way

2 Likes

The numbers are interesting, but more interesting are the modelling conditions. There are a few assumptions most models does that may not hold, therefore we are not being able to have an accurate prediction.

While the current mortality rate is low, the real number is not representative of the reality. Say you keep having the same mortality rate of ‘discovered’ cases; which represent only the cases where the symptoms are problematic enough to actually makes sense to take the plunge and submit yourself to the risk of going to the hospital under a systemic collapse and if you were not infected risk to get infected. So we are probably over representing the mortality but under-representing the actual spread of the infection.

Moreover, there is another assumption that we are doing, which is also ‘best case scenario’. The risk of double infection with multiple strains (or in epidemiological terms, you develop antibodies for single strains) therefore you can actually get over time multiple infection of strains in different epigenetic clusters. To date there are a few already very well defined even in the S strain (which is the milder and found outside of Wuhan). You can explore the clusters here: https://nextstrain.org/ncov

This is assuming that the risk of multiple infections is ‘independent’, BUT, there are many deceases where that assumption doesn’t hold true and we dont know where Covid-19 is right now. For example, if you would use the Dengue model, you will be immune to the first strain, but the second infection can develop a far more deadlier variant known as Hemorrhagic Dengue. So the body reaction to the second strain will be far worse on average. Assuming independence = best case scenario.

By the time WHO decided that they would not measure anymore infected cases in the H1N1 the total deaths per 100000 infections was around 1250 deaths per 100000. As of today at way earlier the number is still climbing and it is at around 3700 per 10000. So if the number would stabilize today (doesnt look like the case given the amounts of deaths reported on Italy and Spain as of today) that means that it will be 3 to 4 times more deadly than H1N1. I touched on that on: https://twitter.com/federicolois/status/1238816432114327552

Granted it is not 1000 fold, but again we dont know many important variables and we are still in the exponential phase with systemic health collapse, if you take averages under those conditions well, they will also average along with the part where the system collapse is not there. Using a more ‘strict’ methodology may not give those conservative numbers. If you account that you are getting say 1000s daily cases and 120 of those are dying and you are able to keep those numbers in check (like it has been in Iran for a few days), you are pointing toward 10% death rate of severe cases under systemic collapse. Italy today shows a similar ratio of 3000s vs 300s deaths. Sadly we don’t have the data at that level of detail available (afaik). That would make a ton of difference to do proper analysis and extract proper models.

What I am saying here is, we have to very very very very careful on stating numbers without stating also the assumptions you are running them from. Assumptions must be peer review and models should be either proven assumption free or the assumption to be explicitly defined for it to be considered valid.

Epidemiological modelling is a big deal, and should be approached in a very careful and methodological way.

Just wanted to add something that seems relevant here. There is a new research challenge on Kaggle with many covid related datasets from Allen Institute: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

Using your own data along with given data is okay in the challenge, so if anyone’s interested, you can check it out. So far some of the notebooks look interesting.

Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset

Requested by The White House Office of Science and Technology Policy, the dataset represents the most extensive machine-readable Coronavirus literature collection available for data and text mining to date, with over 29,000 articles, more than 13,000 of which have full text.

your 3700 deaths per 100000 does not include the asymptomatic
if its akin to the diamond princess, then there could be 300,000 to 600,000 more in the numbers
that would then mean your 3700 per 100k could be as low as 616 per 100k

We can easily be that far off…
by the time we have enough information, it will be way past the inflection point of the curve.

from ECDC:

Clinical presentations of COVID-19 range from no symptoms (asymptomatic) to severe pneumonia; severe disease can lead to death. While the majority of cases (80%) are milder respiratory infections and pneumonias, severe illness and death is more common among the elderly with other chronic underlying conditions, with these risk groups accounting for the majority of severe disease and fatalities to date.

With the potential numbers being that high vs what we are now tracking, which is based on the sickest only, there isnt data good enough yet.

lets say 80% is a valid number… that means the 200k confirmed cases (actually a bit less), mean there are potentially 800,000 not on the radar… and that 80% is based on a sample that was lacking younger cohorts, which the disease does not affect as much, which skews the numbers even more.

It also doesnt help that people are misusing terms…

when testing is out, what will be the false positive or negative rate vs the different kind of testing used in the lab when numbers are low?

Errors propagate in good data!!

TLDR: No, I don’ t have COVID-19 it is just an example…

Exactly, you hit the nail with that comment. We just have data of the ‘discovered’ cases, because we are not doing systematic testing, the death per 100000 is biased making most of the models we can cook wrong.

When the diamond princess happens, there was no systemic health collapse, my country which it is still far off, seems to corroborate those numbers too. They are not by any means in places where systemic collapsed has happened though.

Errors happen on even good data as you say. We are assuming R0 = 2/3 (based on symptomatic covid-19) the R0 may be even higher. I don’t have fever, but my son brought from the school on friday a dry cough. Now both my kids and me are coughing, is it COVID-19? We will never know, unless I develop a fever and therefore I enter into the testing criteria (my bet though it is a normal rhinovirus or something like that).

But, let’s assume for the exercise only, that would be the case. It opens up questions like: How do you explain that using extreme distancing, just sending my kid to school, puts you at risk when at the time there were just 15 cases in the whole country? That would point toward a faulty spreading model.

A collaboration between colleagues across CZI, AI2, Microsoft and others put this together. A good place to start trying to build some tools to challenge COVID-19 could be based on this corpus

[CORD-19 (COVID-19 Open Research Dataset) is a free resource of over 29,000 scholarly articles about COVID-19 and the coronavirus family of viruses.]