I’m trying to tackle the datascience bowl 2018 to get some training in medical segmentation, and I’d like to do it with the fastai library. I’ve been able to work my way around it for now by creating custom databunches from constructor using custom dataloaders and datasets. Now I’d like to use the data_block API to make it cleaner, but there are a lot of things I don’t understand and I’m getting lost in the doc and the source codes.
The competition consists in detecting nuclei in cell images. To do that, we are given a training set that contains the original images, and for each of them mask images for every nucleus to be detected (there is one mask by nucleus). The data is organized such that the image with the id
id can be found in
id/images/id.png and the masks are all in the folder
id/masks. We also have a csv with run-length encoding of the masks combined with the corresponding image id (one line by mask). However, I’d rather not use the rle encoding file as I compute the metric directly from the combined mask (sum of all nuclei masks). The evaluation metric is a custom mean_iou that you can get more information on here, but as it is quite complicated, I will not detail it. The important thing to note is that finding clear separation between nuclei is important. To begin with, my target on training is the combined mask (btw, I wonder if there is a way to train on targets of different sizes, like for instance if I want to train on nuclei centers and radius lists).
Now, here are my remarks/questions:
- Images are obviously of varying sizes, so I decided to train on random crops. I therefore need to apply the exact same transformation to the mask image. How is it supposed to append with de data_block API ? I have a hard time understanding how and when in the pipeline are transforms applied. As I understand it I can pass a list of transforms to the ItemList, so if I give it a random cropping function, how can I make it apply the same transformation to an image and its mask but still have it change at every image ?
- What does the
cattribute correspond to in an
- I guess here my
ImageList(or a custom
- To get labels, I need to call
label_from_funcwith a custom func of mine right (as I don’t have labels but masks)?
- If I want to add a testset, I have another problem, as I use overlapping crops for prediction. I need to create a custom learner with an override of predict to do that ?
As I was writing this I answered some of the other questions I had myself, but in general I’m still quite lost on how to tackle this specific dataset using the data_block API (which is why this text is quite messy, I’m sorry). If you have any insights or ideas on how to do it, I would appreciate it a lot!