Question Answering system for COVID-19

This is a paper summary of deploying a Neural Search Engine to answer questions from the COVID-19 dataset.

“Neural Covidex applies state-of-the-art neural network models and artificial intelligence (AI) techniques to answer questions using the COVID-19 Open Research Dataset (CORD-19) provided by the Allen Institute for AI (data release of April 3, 2020).”

“This project is led by Jimmy Lin from the University of Waterloo and Kyunghyun Cho from NYU, with a small team of wonderful students: Edwin Zhang and Nikhil Gupta. Special thanks to Colin Raffel for his help in pretraining T5 models for the biomedical domain.”


  • Covid-19 dataset
  • Neural Search Architecture for Question Answering
  • Research Wisdom
  • Call for Action



  • [primary] domain experts during a global pandemic
    • public health officials
    • clinicians
    • virologists
  • research community to build on top of their system


  • access latest information based on the rapid progress of the crisis
    • Covid-19 dataset with weekly refresh published by Allen AI
  • make the decision based on evidence
  • generate insights


    • keyword-based search-engine with faceted browsing
    • Neural Covidex - search engine with neural architecture for ranking - question answering
    • highlighting words from passage relevant to query for presentational
  • end to end application
  • component technologies


  • multi-stage search architecture (Eg: Bing, Alibaba)
    • retriever stage: BOW against an inverted index
    • ranking stage: reranking & refine the candidates
  • modular & reusable keyword search
    • anserini & pyserini
    • challenges: length normalization
      • paragraph level indexing vs [a) indexing only title & abstract b) full content]
    • documents ranked using BM25
    • prebuilt anserini indexes for CORD-19
    • demonstrated via notebook
    • unsupervised sentence highlighting using pretrained BioBERT (convert sent from retrieved candidates & query into hidden vectors) from HuggingFace
  • keyword search with faceted browsing
    • anserini integration with Solr
    • blacklight search interface built on top of Ruby on Rails (faceted browsing) with highlighting
  • neural covidex for reranking
    • existing: BERT for passage ranking, BERTserini for retrieval-based question answering, Birch for document ranking
    • Given a query q & a list of documents d1, d2, d3 => return a List [relevant, not-relevant, relevant, relevant] along with confidence scores
    • fine-tuning happened on MS-Marco passage dataset
    • Model trained on MS-Marco dataset can be applied directly to CORD-19 dataset
    • T5-based model
  • Evaluation
    • batch retrieval evaluations(mAP, NDCG) vs end-to-end human in the loop
    • Is this system actively contributing towards the pandemic efforts?
    • Users are firefighting mode, no time to provide guidance
    • “hallway usability testing” - valuable

So What

  • Lessons learned
    • power of Open Source, Open Science, Open culture of sharing data & pre-trained language models (Eg: MS MARCO dataset)
    • software engineering practices(pay dividends in the long run), rapidly explore new ideas
    • reproducible open-source artifacts as the end goal of research culture
    • large gap b/w research code for producing results vs code to be used by real, live, deployed system
    • concerns: latency( testing the user patience) , throughput(# of concurrent users), operational, presentational

Call for Action

  • Share this with front-line like clinicians, virologists & researchers
  • [Immediate] Help with Usability Testing
  • [Feedback] Collect user feedback (LIKE = accurate & relevant DISLIKE = inaccurate & not-relevant) to improve the ranking.