Kaggle Histopathologic Cancer Detection

Here is a great new Kaggle playground comp that just launched.

This looks like a very straightforward competition for image classification and nice testing ground for fastai students :slight_smile: See you on the leaderboard!

13 Likes

looks greate thanks

PCam, the CIFAR-10 of medical imaging!

Yeah, a nice dataset to try for lesson3-camvid notebook—image segmentation with CamVid and dynamic U-Net. Thanks for sharing.

2 Likes

@jamesrequa right behind you on the leaderboard :slight_smile:

1 Like

I’m so happy that these kind of problems are starting to appear on kaggle!
the biomedical field is the field that could benefit the most from deep learning… Luckily someone is starting to build up the database we need!

4 Likes

Nice to participate in such a competition. Unfortunately, I didn’t succeed to perform as good as you guys (around place 100). I used the approach which Jeremy demonstrated in his lesson 3 and 6. I tried to apply resnet, densenet.

Below is one of the models I tried.

arch = models.resnet152
model_name = 'rn152'

f1=Fbeta_binary(beta2=1)

gc.collect()
learn = create_cnn(data, arch, metrics=[accuracy, f1], ps=0.5) 

How many epoch did you do? Did you do special transformations? Can you give some advice to try next?

Funny thing, I’ve submitted 28 different entries so far where I’ve varied dropout rates, learning rates, epochs, augmentations and architectures, and the best architecture so far has been Resnet 34. Even better than DenseNet 161. As far as the leaderboard, I’m still in the 100’s. Don’t think I’ll get much better than that.

1 Like

Here’s what I tried for this competition: https://jithinjk.github.io/blog/Histopathologic%20Cancer%20Detection.html

Lots of room for improvement. :slight_smile:

3 Likes

Anyone made any progress on this?

My best is .9616 with a resnet34 trained on roughly 50k of the images. 96x96 and no cropping. But it seems that I hit a wall. Even training the same model on all of the images does not improve it which is odd. And the long training time of roughly 2 hours even on a small model like resnet34 kills me.

Any tips?

I have used a starter code (a simple CNN trained from scratch, only Keras) and got it up to 90%. Next I will implement data generators.

It looks like data augmentation will be key.

One of the issues with HE staining is that the coloring varies depending on the time of stain, the thickness of the tissue slice and maybe even the age of the staining solution. Anybody preparing such datasets has probably tried to standardize these factors, but you never now. One augmentation would be to adjust the Hue.

The heatmap looks plausible. The question for the response variable is “Is there cancer tissue inside the middle square?”

The cancer tissue is composed of nuclei and cytoplasm. The nuclei are the dark violet stuff, the rest of the cell is cytoplasm, and where there is white there is air, water or some other reason why there is nothing blocking the view.

The difference Human pathologists look for is the heterogeneity of the nuclei. Malignant cells proliferate out of control, and therefore they have to copy their DNA (inside the nuclei) like crazy, while the DNA is normally packed up neatly when the cell is not dividing. Also it is more common to see multiple nuclei in one cell in cancer, than in benign tissue.

So basically the thing to look for is the shape and texture of the nuclei.

Also 96% would be close to the point where you get pathologists arguing with each other!

5 Likes

I beefed up my resnet34 now up to .9638 on the public leaderboard. Think there might be some room for some further improvement with further training.

What is really odd is that the public leaderboard score does not really correlate with my local validation set. There has to be something special about the test set. I have several occasions where I improved the score on the validation set but the score on the public leaderboard was actually worse. very odd.

1 Like

If you are tweaking hyper parameters or capturing the best of a couple runs influenced by some residually random factor, based on the validation set, you are in a way learning on the validation set. So yes, the test set is special, compared to your validation set, with respect to the model you trained.

Andreas, it’s a very good point that you can indirectly train on the validation set by adjusting hyperparameters or choosing runs. But I don’t think that’s the specific problem observed here.

I also see a large gap between AUC on validation and the AUG calculated by Kaggle, as do several commentators in the competition’s discussions. My guess is that the Test set is not the same random distribution as the Training set. For example, the actual cancer fraction in Training and predicted from the Validation sample is .40. Predicted cancer in the Test set is .33. Validation AUC = .99, while Kaggle’s AUC = .95. These numbers are all from a single session with a clean, pseudo-random start.

Any further clues are appreciated!

1 Like

Another reason may be the way that the original dataset ( I think it was called something like Camelyon17) is composed. They created “virtual” patients by combining slides from multiple patients lymph nodes into one patient. If I were to design a train/test split for the “simplified” pcam set, I would make sure the training and test data don’t come from the same patients, and thus not from the same slides.

I suspect that between different patients, slicing lymph nodes and staining them there is a lot of inter-slide variability.

1 Like

Started with DL1 2019 edition last week.
Because of a, not so helpful yet, background in health care (MD, epidemiology), I joined the Histopathologic Cancer Detection competition on Kaggle.

Using resnet50, 75% of the data and only the basic augmentations (no extras) I reached the top 7% of the competition now. It’s not very fancy probably. Still, it made me happy already :slight_smile:

@sinsji congratulations!!
can you please provide more details about your transforms ?

I used following transform
tfms=get_transforms(flip_vert=True, max_warp=0, max_rotate=10, max_zoom=0, max_lighting=0, p_affine=0.5, p_lighting= 0 )

I did not use zoom because I believe these images must be within some scale ( I may be wrong)
I am trying to break score of 0.97 but somehow not able to do that. My scores are always between 0.96 and 0.97.
Thank you for your time and advice.

In my experiments I found the test score to fluctuate around that 97% as well. Higher validation accuracy did not improve my test scores yet, so maybe there is overfitting…
In you current settings the p_affine seems a bit high, compared to what I copied from Jeremy > 0.1.
Otherwise I just tested a couple augmentations. I included brightness and contrast transformations.

Good luck.

Thank you @sinsji

Try progressive resizing i.e start with small image size, train for a while, then train on the original image size. This technique is a game changer.