Kaggle Competition Launch: Recursion Cellular Image Classification

Kaggle Competition launch: Recursion Cellular Image Classification

CellSignal: Disentangling biological signal from experimental noise in cellular images.

"The cost of some drugs and medical treatments has risen so high in recent years that many patients are having to go without. You can help with a classification project that could make researchers more efficient.

One of the more surprising reasons behind the cost is how long it takes to bring new treatments to market. Despite improvements in technology and science, research and development continues to lag. In fact, finding new treatments takes, on average, more than 10 years and costs hundreds of millions of dollars.

Recursion Pharmaceuticals, creators of the industry’s largest dataset of biological images, generated entirely in-house, believes AI has the potential to dramatically improve and expedite the drug discovery process. More specifically, your efforts could help them understand how drugs interact with human cells.

This competition will have you disentangling experimental noise from real biological signals. Your entry will classify images of cells under one of 1,108 different genetic perturbations. You can help eliminate the noise introduced by technical execution and environmental variation between experiments.

If successful, you could dramatically improve the industry’s ability to model cellular images according to their relevant biology. In turn, applying AI could greatly decrease the cost of treatments, and ensure these treatments get to patients faster.

This competition is a part of the NeurIPS 2019 competition track. Winners will be invited to contribute their solutions towards the workshop presentation."


The problem with this competition is that the dataset is 46 GB. However, they do provide GCP credits. But if you don’t use GCP I am not sure if it is possible to compete.


I think it’s an interesting challenge. You need to create a model that is robust to the high level of variance between plates. I’ve taken a few cracks at it. I think they key is to somehow use the control wells on the plate to debias the data. I’ve experimented with a few methods, but I haven’t found anything that generalizes.

1 Like