Kaggle 'The Nature Conservancy Fisheries Monitoring' competition

jeremy · November 15, 2016, 7:11pm

Who’s working on the fish competition? Rachel and I have both started looking at it. Let’s talk about the competition here. I’ll start with a question: my test set result is quite a bit worse than my validation set result. I see on the kaggle forums that others have seen this too. Any ideas why? I haven’t had a chance to look into it yet - I’m guessing that the test set is somehow different to the training set…

BTW - I found this kaggle kernel helpful.

rachel · November 15, 2016, 8:39pm

I’m getting the same issue. My validation set loss is ~1.2, but my Kaggle test set loss is ~1.7.

Since the categories are unevenly distributed, I’m using stratified sampling. I’ve finetuned vgg, similar to the dogs-vs-cats redux example. I’m going to try using a non-pretrained model next.

chris · November 16, 2016, 3:48am

I just placed 6th. Kaggle loss 1.12451, val_loss: 0.1787

Beginners luck?

jeremy · November 16, 2016, 5:21am

That’s so cool @chris ! What approach did you take? DId you finetune vgg, or create your own net from scratch? Any preprocessing?

jeremy · November 16, 2016, 6:14am

I just tried clipping my maximum probabilities to 0.7 and that moved me from about 70th to 12th - so it seems that taking account of the very different test set is important! That’s a simple dense model on vgg convolutional layer, without even any data augmentation or pseudo labeling yet.

chris · November 16, 2016, 3:32pm

Fine tuned VGG model with augmentation, pseudo-label generation and clipping. All the bells and whistles from your lesson 4 notebook.

jeremy · November 16, 2016, 4:59pm

Nice! Definitely not beginner’s luck then - that’s a great approach and I’m glad to here it worked for you. Great job on getting it implemented for this competition so quickly!

chris · November 16, 2016, 6:16pm

I tried pseudo labelling using the test set and it worsened my Kaggle score from 1.12451 to 1.19297.

jeremy · November 16, 2016, 6:35pm

How did you do it? If you just concatenated the test set with the training set you would have too many pseudo labels each batch. You need a mix of around 2/3 training data to 1/3 pseudo-labeled data.

chris · November 16, 2016, 6:39pm

That’s exactly how I did it, and I realised why it didn’t work when I got to that bit in the lesson 4 video. I was there in person, but I only remembered once I tried it.

jeff · November 20, 2016, 10:10pm

My submission score (currently 1.15881) was also much worse than my val_loss of 0.1609. So far, I’ve only fine-tuned Keras’ VGG16 model with the Nadam optimizer and a sufficiently low learning rate. I haven’t yet applied dropout, ensembling, or pseudo-labels. Nor have I handled the class imbalance issue. Data augmentation, even with tiny increments, did not improve val_loss for me. A few other things I noticed:

I again tried Keras’ ResNet50 on this Kaggle competition but could not tame it enough to converge to a respectable validation score, despite trying lower and lower learning rates and other optimizers. VGG16 seems to give respectable results relatively quickly. I’ve not tried other achitectures like Inception, however.
Despite training on various low learning rates with pre-calculated inputs into VGG16’s fully connected layers, I actually got better results training on the entire model (while still freezing the base convolutional layers). Does anyone know why that could happen?
Note: My software configuration includes CUDA 8.0 with CuDNN 5.1; I created my training / validation split with Scikit Learn’s StratifiedKFold

Since this competition has images that are significantly higher than the 224x224 inputs into the pre-trained ImageNet architectures we’re familiar with (e.g., VGG16, ResNet), I can’t help but wonder whether we should pre-pend our model with a convolutional layer that accepts a large image size (e.g. 2048x2048) that outputs 224x224 images to VGG16. Is that a worthy approach? I can’t find a definitive answer on how pre-trained ImageNet architectures can be used with higher res images.

jeremy · November 21, 2016, 12:44am

Your last idea there is interesting - I haven’t seen that tried before. You may also find taking a few crops and averaging them could be a good way to handle it.

vshets · November 21, 2016, 2:00am

Fortune article on this competition - http://fortune.com/2016/11/14/deep-learning-artificial-intelligence-tuna-industry/

jeff · November 22, 2016, 7:20am

@jeremy I heard your mention of attention models during tonight’s class and I think that’s the hint I needed to handle high resolution images. I’m trying to do some reading on it right now.

jeremy · November 22, 2016, 6:19pm

Looking forward to hearing about how you go!

jeff · November 28, 2016, 11:49pm

In my search for attention models, I came across Google DeepMind’s paper on a relatively new type of layer – the Spatial Transformation Layer (https://arxiv.org/pdf/1506.02025v3.pdf). I found someone’s Keras implementation on GitHub and successfully ran their sample notebook using it on cluttered MNIST data. I was amazed to see it auto-focus on the correct part of the image! I don’t know why this type of layer hasn’t become standard yet. It’s been published for over a year.

jeremy · November 29, 2016, 12:58am

I know of lots of people that have tried to use it, but no-one that’s successfully used it for their own real world data. I’m still excited about the idea - so it would be great if you could try it for something like the fisheries competition (where I think that focusing on the fish is important).

sravya8 · November 29, 2016, 10:16pm

Hey @jeff Do you mind sharing which keras implementation you used?

jeff · November 29, 2016, 11:27pm

Googling for “spatial transformer network keras” led me to https://github.com/EderSantana/seya

jeff · November 30, 2016, 5:42pm

I’ve tried the Keras implementation (https://github.com/EderSantana/seya) with both the fish image data set and the mammography dataset, but it didn’t work out. I used a single spatial transformer layer as the first layer before the rest of the CNN with input image size of 4096x4096 and lowered my batch size due to the increased memory usage. Training time was very slow (as might be expected), and I didn’t even get a validation accuracy above zero after 2 epochs on the mammography data set. Is there a code sample for a more established attention model that I should be using instead? R-CNNs perhaps?