Kaggle Histopathologic Cancer Detection

sgmiriuka · March 5, 2019, 11:36am

I’ve been playing around with this database, and kept around .97.
This post and comments help me a lot: https://www.kaggle.com/c/histopathologic-cancer-detection/discussion/81747
Tried resizing but did not change a lot, I’ll try adding to the suggestions in the post.

sermakarevich · March 13, 2019, 5:21pm

My short write up of currently #1 solution and Complete handcrafted pipeline in PyTorch (Resnet9)

sgmiriuka · March 13, 2019, 5:35pm

thanks!

mark_ok · March 14, 2019, 9:50pm

Hi Everyone, Big thanks for all those participating in this competition. Personally i have learnt a lot since i started with it.

Currently am facing a problem with loading a pre trained model using create_cnn(). I keep getting this error.
ConnectionError: HTTPSConnectionPool(host=‘download.pytorch.org’, port=443): Max retries exceeded with url: /models/resnet152-b121ed2d.pth (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7fb4651f2ba8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

›

mark_ok · March 15, 2019, 12:20am

So sorry, am new to this but just learnt that i didnt turn on internet. Oh learnt that

adeperio · April 5, 2019, 7:17am

Hi everyone, I also ran through the data in this competition. I used resnet34 as the base architecture. I’m getting 96.3% accuracy which I was pretty happy about! But when I ran it with the resnet50 model the model actually performed worse, but by only 1-2%. I was expecting an improved result. Any idea why this could be the case? If considering only the change in base model?

Mauro · April 5, 2019, 11:32am

Assuming everything else is the same, the simplest response would be overfitting since Resnet50 is a deeper (more layers) architecture.

adeperio · April 5, 2019, 12:54pm

Ahh yes, thanks, ok that makes sense

adeperio · April 7, 2019, 3:01am

Also I am noticing another thing with this data. The images in the data set are quite small (96 x 96). However, when I increased the image size to 224x224, by specifying the image size when creating an ImageDataBunch from the data (using the size parameter), I get significantly better results than specifying the actual image size. Why would this be the case?

Mauro · April 7, 2019, 12:19pm

Because the pre-trained ImageNet model that you are using to train your model uses 224x224 images for its input.

adeperio · April 7, 2019, 12:48pm

Ahh gotcha! Thanks a lot Mauro, thats all a lot clearer to me now!

adeperio · April 15, 2019, 2:59pm

Been working on this project over the last few weeks. Managed to get 98.6% which I blogged about here: https://www.humanunsupervised.com/post/histopathological-cancer-detection. I missed the competition deadline but it was a good learning experience finishing it up end to end : )

Pomo · April 15, 2019, 8:55pm

Hi Antonio. I appreciate the clarity of your blog writeup. I also entered this competition and it proved to be a profound way to learn to use fastai in practice.

A couple of points, and my intention here is to help, not criticize.

The measure published in the leaderboard is ROCAUC, which is correlated but different from accuracy. There are several forum posts that explain how to calculate it on your Validation set.
My experience (as for many others) was that the ROCAUC scored by Kaggle on their provided Test set was consistently much lower than the one calculated locally on the Validation set. No one was able to fully explain the difference. I think you can submit the Test set to Kaggle and see how they score it, even after the competition has closed.

Thanks for sharing your work!

adeperio · April 15, 2019, 10:20pm

Hi @Pomo Thanks oh yea thats good to know! I didn’t know Kaggle uses ROCAUC to calculate the score (I actually don’t know what this is! )

I calculated my score using

preds,y, loss = learn.get_preds(with_loss=True)
# get accuracy
acc = accuracy(preds, y)

which I assume was operating on the test data set. But I should submit my solution to the competition and see what score I actually get on there and update my blog post accordingly.

Thanks for the feedback!