Question Answering system for COVID-19

msivanes · April 11, 2020, 3:32pm

This is a paper summary of deploying a Neural Search Engine to answer questions from the COVID-19 dataset.

“Neural Covidex applies state-of-the-art neural network models and artificial intelligence (AI) techniques to answer questions using the COVID-19 Open Research Dataset (CORD-19) provided by the Allen Institute for AI (data release of April 3, 2020).”

“This project is led by Jimmy Lin from the University of Waterloo and Kyunghyun Cho from NYU, with a small team of wonderful students: Edwin Zhang and Nikhil Gupta. Special thanks to Colin Raffel for his help in pretraining T5 models for the biomedical domain.”

Topics

Covid-19 dataset
Neural Search Architecture for Question Answering
Research Wisdom
Call for Action

Outline

Who

[primary] domain experts during a global pandemic
- public health officials
- clinicians
- virologists
research community to build on top of their system

Why

access latest information based on the rapid progress of the crisis
- Covid-19 dataset with weekly refresh published by Allen AI
make the decision based on evidence
generate insights

What

covidex.ai
- keyword-based search-engine with faceted browsing
- Neural Covidex - search engine with neural architecture for ranking - question answering
- highlighting words from passage relevant to query for presentational
end to end application
component technologies

How

multi-stage search architecture (Eg: Bing, Alibaba)
- retriever stage: BOW against an inverted index
- ranking stage: reranking & refine the candidates
modular & reusable keyword search
- anserini & pyserini
- challenges: length normalization
  - paragraph level indexing vs [a) indexing only title & abstract b) full content]
- documents ranked using BM25
- prebuilt anserini indexes for CORD-19
- demonstrated via notebook
- unsupervised sentence highlighting using pretrained BioBERT (convert sent from retrieved candidates & query into hidden vectors) from HuggingFace
keyword search with faceted browsing
- anserini integration with Solr
- blacklight search interface built on top of Ruby on Rails (faceted browsing) with highlighting
neural covidex for reranking
- existing: BERT for passage ranking, BERTserini for retrieval-based question answering, Birch for document ranking
- Given a query q & a list of documents d1, d2, d3 => return a List [relevant, not-relevant, relevant, relevant] along with confidence scores
- fine-tuning happened on MS-Marco passage dataset
- Model trained on MS-Marco dataset can be applied directly to CORD-19 dataset
- T5-based model
Evaluation
- batch retrieval evaluations(mAP, NDCG) vs end-to-end human in the loop
- Is this system actively contributing towards the pandemic efforts?
- Users are firefighting mode, no time to provide guidance
- “hallway usability testing” - valuable

So What

Lessons learned
- power of Open Source, Open Science, Open culture of sharing data & pre-trained language models (Eg: MS MARCO dataset)
- software engineering practices(pay dividends in the long run), rapidly explore new ideas
- reproducible open-source artifacts as the end goal of research culture
- large gap b/w research code for producing results vs code to be used by real, live, deployed system
- concerns: latency( testing the user patience) , throughput(# of concurrent users), operational, presentational

Call for Action

Share this with front-line like clinicians, virologists & researchers
[Immediate] Help with Usability Testing
[Feedback] Collect user feedback (LIKE = accurate & relevant DISLIKE = inaccurate & not-relevant) to improve the ranking.