The Kaggle Data Science Bowl 2018 was announced today:
The goal is to identify and segment cells nuclei from microscopic images. Clinically, nucleus size and Nucleus-to-Cytoplasm ratio are important features used by pathologists to classify many types of cancer. This is also an important microscopic feature for research to potentially quantify drug (chimiotherapy) responses.
I hope this healthcare challenge will interest many fast.ai students!
Congrats on your score so far in this challenging competition!
I am also interested in implementing a Mask RCNN model but donât have any experience using this before. Do you have any suggestions (tutorials etc) for how to get started with it?
Hopefully Jeremy has plans for adding something like this to the fastai library! Or perhaps it will be covered in Part 2 v2 (cc @jeremy)
Thx James. I started initially from scratch using the Keras-RCNN extension that the current leader recommended (https://github.com/broadinstitute/keras-rcnn) It helped me a lot to understand the entire architecture.
But after 2-3 days, I decided to switch to this keras/tf implementation with decent documentation that is working very well : https://github.com/matterport/Mask_RCNN
That is the current backbone of my pipeline. Of course you need to adapt for the current dataset and like Allen the current leader said, there is a lot of post-processing needed to get decent results from Mask-RCNN. Probably with the same trained model, different post-processing pipelines can probably give variable results from 0.2 to 0.63. But Mask-RCNN, being an instance segmentation model, definitely allows better results than Unet that is a semantic segmentation model.
FAIR released officially and publicly their Caffe2 version about 2 weeks ago. I didnât really look into it but technically this is probably the most efficient (training and inference) implementation : https://github.com/facebookresearch/Detectron
But after 2-3 days, I decided to switch to this keras/tf implementation with decent documentation that is working very well : https://github.com/matterport/Mask_RCNN4
Hi Alexandre, thx for the repo. I ran the demo notebook and encountered issues when creating MaskRCNN model, ie âFailed to convert object of type <class âtheano.tensor.var.TensorVariableâ> to Tensorâ. Have you had this problem?
I donât remember having this problem but I am using tensorflow as the backend for keras. It looks like you are probably using theano as the backend for keras. The readme says it is an implementation for Python 3, Keras, and TensorFlow. Parts of the code is using native tensorflow. I hope it helps.
I would recommend to start by looking at the shape demo and adapt the code (Mostly config and Dataset classes) for your nuclei data. Implementing the class functions load_image and load_masks is part of the initial job.
thanks for your hint. I managed to get the following working, but when I start the training, the process just hangs. Can I ask you to have a quick look at the following code in case you find any obvious mistakes?
I also am using this Mask-RCNN model and have it working with Kaggleâs nuclei dataset.
The thing that caught my eye in your code is the way you have implemented def load_mask here. Note that the default for this method simply loads an empty mask so youâll need to re-write it to override the default so that it actually loads the masks from the nuclei dataset (including assigning the class_ids).
@fabsta, I agree with @jamesrequa . You need to load the masks by yourself in the inherited load_mask function. The load_mask function is called dynamically by the training generator.
Thanks for creating this thread I am also participating in the competition but still at the phase of exploring Unet To what extent are you using fastai library ? I want to use as much as I can especially for training loops (learner) .
Also, Iâve observed (of course not a big surprise) that running having different modality images makes training more unstable. How did you tackle this problem. I am thinking of clustering and then running different models for different images, would it be too much ?
Iâve started the competition as well. Thanks for creating this thread.
I have one question though. On Kaggle, most of the Kernels there, for this competition, use UNet. I wanted to ask why is that? How is UNet better than Mask-RCNN?
Another thing. Is Mask-RCNN, or other RCNN based architectures, better than alternatives like YOLO and SSD? I suspect that the answer might be that they both have separate use cases. In that case, when to use which? Is it feasible to use them in this competition?
Yolo and ssd are doing âdetectionâ using a bounding box: using two corners of a rectangle, they are able to say approximately where an object is.
In this competition, you need to be more precise and find the exact areas (not just an approximate bbox) which is called a segmentation.
Unet does traditionally sementic segmentation which means that it will find the areas of interest but wonât be able to separate them. People have been trying to find center and contours using unet in order to be able to separate the centroids.
Mask RCNN does instance segmentation which means that it is able to differentiate the centroids without any tricks. Unfortunately, the structure of mask rcnn is much more complex than Unet and the part of mask rcnn doing segmentation is not as powerful as Unet so it make Unet a good contender in this competition.
Averaging/ ensemblig results in this kind of competition isnât as easy as other traditional competition as the output is a series of masks (images).
Unet is probably used by the majority of participants because it is a simpler model.
Mask-RCNN is probably used by the majority of the top 100 participants because it is a better model for the competition metric.
If 2 nuclei overlap, U-Net will count them as 1 large object (semantic segmentation). Youâll basically receive a score of 0/2 objects detected because none of the 2 ground truth objects will really be similar to the combined large predicted object.
Mask-RCNN will detect both objects (instance detector and segmentation) so you have a potential to get a 2/2 score.
And when a single object is detected with a high confidence and high sensitivity, then generating the mask/segmentation is pretty straightforward.