Kaggle Histopathologic Cancer Detection

I’ve been playing around with this database, and kept around .97.
This post and comments help me a lot: https://www.kaggle.com/c/histopathologic-cancer-detection/discussion/81747
Tried resizing but did not change a lot, I’ll try adding to the suggestions in the post.

1 Like

My short write up of currently #1 solution and Complete handcrafted pipeline in PyTorch (Resnet9)

6 Likes

thanks!

Hi Everyone, Big thanks for all those participating in this competition. Personally i have learnt a lot since i started with it.

Currently am facing a problem with loading a pre trained model using create_cnn(). I keep getting this error.
ConnectionError: HTTPSConnectionPool(host=‘download.pytorch.org’, port=443): Max retries exceeded with url: /models/resnet152-b121ed2d.pth (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7fb4651f2ba8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

So sorry, am new to this but just learnt that i didnt turn on internet. Oh learnt that

2 Likes

Hi everyone, I also ran through the data in this competition. I used resnet34 as the base architecture. I’m getting 96.3% accuracy which I was pretty happy about! But when I ran it with the resnet50 model the model actually performed worse, but by only 1-2%. I was expecting an improved result. Any idea why this could be the case? If considering only the change in base model?

Assuming everything else is the same, the simplest response would be overfitting since Resnet50 is a deeper (more layers) architecture.

1 Like

Ahh yes, thanks, ok that makes sense

Also I am noticing another thing with this data. The images in the data set are quite small (96 x 96). However, when I increased the image size to 224x224, by specifying the image size when creating an ImageDataBunch from the data (using the size parameter), I get significantly better results than specifying the actual image size. Why would this be the case?

Because the pre-trained ImageNet model that you are using to train your model uses 224x224 images for its input.

1 Like

Ahh gotcha! Thanks a lot Mauro, thats all a lot clearer to me now! :slight_smile:

Been working on this project over the last few weeks. Managed to get 98.6% which I blogged about here: https://www.humanunsupervised.com/post/histopathological-cancer-detection. I missed the competition deadline but it was a good learning experience finishing it up end to end : )

Hi Antonio. I appreciate the clarity of your blog writeup. I also entered this competition and it proved to be a profound way to learn to use fastai in practice.

A couple of points, and my intention here is to help, not criticize.

  1. The measure published in the leaderboard is ROCAUC, which is correlated but different from accuracy. There are several forum posts that explain how to calculate it on your Validation set.

  2. My experience (as for many others) was that the ROCAUC scored by Kaggle on their provided Test set was consistently much lower than the one calculated locally on the Validation set. No one was able to fully explain the difference. I think you can submit the Test set to Kaggle and see how they score it, even after the competition has closed.

Thanks for sharing your work!

Hi @Pomo Thanks oh yea thats good to know! :slight_smile: I didn’t know Kaggle uses ROCAUC to calculate the score (I actually don’t know what this is! :flushed:)

I calculated my score using

preds,y, loss = learn.get_preds(with_loss=True)
# get accuracy
acc = accuracy(preds, y)

which I assume was operating on the test data set. But I should submit my solution to the competition and see what score I actually get on there and update my blog post accordingly.

Thanks for the feedback! :slight_smile: