Kaggle COVID-19 Open Research Dataset Challenge (CORD-19)

In response to the COVID-19 pandemic, the White House and a coalition of leading research groups ( AI2, CZI, MSR, Georgetown, NIH) have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.


Maybe we need to generate data from the public themselves, perhaps each individual should make diary entries for each day irrespective of whether they have the virus or not, perhaps a standard form for diary entries should be proposed which can readily produced a consistent dataset. Possible entries could be.

  • GPS Location
  • Age
  • Sex
  • Number Persons Close
  • Number of Contacts per Day
  • What Symptoms
  • Washed Hands After What Activity
  • Self Isolating
  • Exercising in Isolation

Perhaps some of the above, and I am sure others could be suggested.
Not sure if this makes any sense but it’s just my 3 penny worth.

Might be useful …

COVID-19 Data Tools