Can we do something as a Community to fight COVID-19 (aka Coronavirus)?

Sure we can, we just need to figure how.

Those that follow me on twitter know that I have been tracking the numbers since early January (when the numbers didn’t look good as we all know now). As I outlined before the problem is among those things that we cannot figure out fast when people are infected or soon to develop the infection. While protocols had been designed most of the secondary outbreaks look like they belong to the class of blind testing protocol failures.

If you are looking for phase 1 and phase 2 cases (aka the imported and direct contact class), you may be missing sleepers, those that were discarded because they are not what you are looking for. Those are not required to be isolated, and therefore push you toward an Italy situation.

We are data guys, let’s collaborate on what we know what to do. Protein folding and that is useful, but you really have to be an expert on the domain to be able to do something. We are experts in the classification and image domain, there are millions of different forms that images are taken from our body, CTs, Xrays, etc. CTs have shown good results for screening, but CTs are not so available in third world countries or in short supply, if other imaging technology like x-rays or ultrasound can do the job, we can shorten the diagnosis time and help the effort in the way we know how.

Call your hospitals, convince them to share their anonymous data dumps of respiratory illness images including SARS, MERS, COVID and similar in annotated forms. We don’t need much data, we need to know true or false… we don’t care about anything else. Other forms of support like computing power and media support, anything can make a difference.

If you know someone there, bother them until you are able to get to the one that must sign the release. Desperate times call for desperate measures.

We have lots of computing power and brains around in this community alone. We don’t have what we are good at. We need to explore the data!!!


Jeremy and Rachel wrote a very thorough blog post discussing the virus, which was extensively researched and analyzed:

As far as data, here is what I’ve seen (via Jeremy’s twitter)

Individual Case Details from Hong Kong, Singapore, and South Korea, and Johns Hopkins data in SQL:

Exponential Growth Modeling of the virus:

Daily Reports via Johns Hopkins:


Yes, I read the article, that’s how I got the idea about tackling early diagnosis. Which appears to be the most pressing problem right now. I have been hunting for images, and there is almost no dataset published, that’s why I mentioned we need to procure the data; probably they don’t even exist and we need to convince hospitals that are not yet overcapacity to procure them if possible.

1 Like

IIRC I saw something on twitter discussing possible options. Give me a few minutes and I’ll edit this post. (I believe it was looking at the flu but

Got it! (Or Atleast 1)

Chest XRays (of Corona):

Also on Kaggle:


The best thing most of us can do is work remote and encourage are respective employers to both allow and encourage remote working where possible.

Talk is cheap and for so many organizations to be talking about how much they care about their employees while requiring them to come in each day when they don’t need to or not canceling events cuz it may drop their share prices by half-a-penny, is not backing up that message.

At this point, the only two people saying this isn’t an issue are Donald Trump and Elon Musk … the first has never been wrong about anything and is the best at everything, the second is way outside the purview of their expertise. Let’s be better than that.


Yeah I stumbled upon that one, and to me, that would be the proper place to direct the samples… right now the problem is that the amount of data is not enough to do something reasonable that could be realistically be deployed through a web interface for doctors to use.

About politics I would stay away from it, my government is not doing the right thing either, but that is something I cannot control. What we can control is tech, that’s where we should focus our efforts.


Maybe this is less appropriate for this forum, but because I specifically do politics consulting + tech… call your local governments and demand they lock stuff down. (USA advice, fwiw.)

I promise, they are in communication with the higher-ups and are they are the ones responsible for reporting on-the-ground effects. I can promise this because I’m writing the responses.

In addition, get your parents/elders/whatevers off the streets.

It’s more helpful than computing power at this point.


I hope we can have a larger chest x-ray data set online that can allow us to train.

I’m interested in working on COVID-19 work, whether ML related or not. Maybe we can compile a list of projects people have started and keep them in the OP.

I’d like to do something sometime on this. Already have some ideas (there’s a specific journal paper and a dataset that allows the modeling–the topic is on estimating case fatality rates where some aspects of the data are unknown). I’d also be open to other ideas.

An other data source that might be interest from wolfram.epidemic (World, China, US) , genetic and patient data

Let me be a bit more specific:

To use either the 2nd approach (parametric) or 3rd approach (Kaplan-Meier, non-parametric) described in the following paper:
Methods for Estimating the Case Fatality Ratio for a Novel, Emerging Infectious Disease

According to the paper, the 2nd approach is better at the very earliest stages of the disease, and the 3rd approach is better later on.

Relevant packages (that I have experience with):

  1. the Python lifelines package.
  2. the R survival package

Dataset here:


Can we get data that will label the imaging with that STRAIN of covid-19 that patient had, and if they died or recovered? The s-type is the strain with the high mortality rate. Maybe we can figure out how to pinpoint s-type from an image, to get results before the test can be done. You know, since we have a massive shortage of tests, but we can do imaging today!


Couple of questions:

  • I have not seen any data related to the false positive or false negative rates of any of the tests currently being administered. Is this taken into account in any of the projections?

  • Are there any datasets that have demographic information on the number of cases (age, gender etc.). Have not seen this in the JHU or Kaggle dataset. I have seen some mortality estimates based on diabetic and cardio-vascular history. But not on past smoking history. Given that China and Italy have high rates of smoking.

My wife works in the lab, no. At least here there are more negative than positives, for now. But on some countries, like Iran, there has been anecdotic reports that it goes as high as 50/50

I am asking myself the same. I know that here they are using PCR style test, except for a negative there is hardly any imaging technique used. It is reported that in China they use portable CT scanners for mass screening, but no clue if it is viable to get that data and from where.

Hi, I just found out about Folding@Home, which is a “distributed computing project for disease research.” You can help them out by lending your GPU. Additionally, they have a repository containing “all input files and generated datasets for the Folding@home efforts to better understand how the SARS-CoV-2 virus that causes COVID-19 can be targeted with small molecule and antibody therapeutics.” You can read more about it on their blog


@jeremy @rachel
EDIT: Remade this into a medium post.

I have read your piece on coronavirus and it is to the point.

I believe I have some good ideas for how to help:

We’re currently seeing a lot of stringent measures to stop the spread of COVID-19. Maximizing policy impact is important so I’ve been thinking about the cost to impact ratio of various actions being taken by governments around the world.

As I understand it, the key to stopping the spread of a viral infection is reducing its R0. This can be accomplished in two major ways:

1. Restrict freedom of movement ( very expensive )

Strong and extremely expensive measures (shutdowns, state-wide quarantines etc.) are being taken to that effect (or will be soon) by many European countries.

2. Minimize transmission probability even in case of physical contact ( very cheap and can be implemented long time before quarantine )

Suggested steps that I haven’t seen discussed anywhere but are free to extremely cheap in comparison with 1:

  • Aggressive public awareness campaign about individual behavior recommendations (handwashing importance, proper way to cough, social distancing, etc.).via the following media for example * regular internet popups for all users enforced through internet service providers

    • regular TV, radio “commercials” especially in prime time slots
    • leaflets to every household
    • public advertisement spaces (public transports, billboards, etc.) * daily text messages
  • Government paid mass manufacture of personal protection medical supplies to prevent worldwide shortage as predicted by WHO (by subcontracting or state owned companies)

    • if there was a surplus, face masks in public could be made mandatory which would drastically reduce R0 due to infected individuals not transmitting so often. This also leads to much less face touching

    • sanitizing gels are not available (it is a very confusing message for the public if the number one step for protection - keeping their hands clean is not possible to accomplish in public. WHO has made a leaflet for how to make homemade sanitizing gels but this should be mass produced by the government

    • face masks and gels should ideally be delivered for free in sufficient numbers to each household

Create government infrastructure or support private companies to facilitate online orders and home delivery of groceries . There is no need for millions of people to go shopping each day and unnecessarily keep infecting each other.

I am writing a recommendation letter that I intend to send to as many authorities as possible (governments, non profits, European Union organizations etc.) but ideally this would be a policy strategy created by a professional team with experience in this area, but I can do the best I am able to.

Besides government action, most effective public action I can think of is to pressure websites with large user counts (Facebook, Google, YouTube, Twitter, Wikipedia, etc.) to show the personal health recommendations as well (using a popup or however they wish to). This can be done by contacting them or creating online petitions, viral tweets (#stopcovid19, #washhands, etc.)

Other out-of-the box ideas is to hire publicity companies to create marketing campaign with the intention to go viral (Vietnam’s ministry of health made a music video about washing hands to stop COVID-19)

I’ve made a little google sheets document with Cost/Impact ratio of various policy measures (the numbers are just an educated guess, accurate numbers would help a lot to lend credibility to the strategy but I believe it’s common sensical enough for most people).

Hi Guys - I built a model yesterday, so far I have it to 83%. But the dataset is tiny, any help of dataset would be much appreciated, please read about it here:

Let me know your thoughts!

Here’s the dataset I am using:

Hey guys - I built a model on this data! Check it out: