Kaggle 'The Nature Conservancy Fisheries Monitoring' competition

(Jeremy Howard) #1

Who’s working on the fish competition? Rachel and I have both started looking at it. Let’s talk about the competition here. I’ll start with a question: my test set result is quite a bit worse than my validation set result. I see on the kaggle forums that others have seen this too. Any ideas why? I haven’t had a chance to look into it yet - I’m guessing that the test set is somehow different to the training set…

BTW - I found this kaggle kernel helpful.

Kaggle Fishery Competition Discussion
(Rachel Thomas) #2

I’m getting the same issue. My validation set loss is ~1.2, but my Kaggle test set loss is ~1.7.

Since the categories are unevenly distributed, I’m using stratified sampling. I’ve finetuned vgg, similar to the dogs-vs-cats redux example. I’m going to try using a non-pretrained model next.

(chris) #3

I just placed 6th. Kaggle loss 1.12451, val_loss: 0.1787

Beginners luck?

(Jeremy Howard) #4

That’s so cool @chris ! What approach did you take? DId you finetune vgg, or create your own net from scratch? Any preprocessing?

(Jeremy Howard) #5

I just tried clipping my maximum probabilities to 0.7 and that moved me from about 70th to 12th - so it seems that taking account of the very different test set is important! That’s a simple dense model on vgg convolutional layer, without even any data augmentation or pseudo labeling yet.

(chris) #6

Fine tuned VGG model with augmentation, pseudo-label generation and clipping. All the bells and whistles from your lesson 4 notebook.

(Jeremy Howard) #7

Nice! Definitely not beginner’s luck then - that’s a great approach and I’m glad to here it worked for you. Great job on getting it implemented for this competition so quickly!

(chris) #8

I tried pseudo labelling using the test set and it worsened my Kaggle score from 1.12451 to 1.19297.

(Jeremy Howard) #9

How did you do it? If you just concatenated the test set with the training set you would have too many pseudo labels each batch. You need a mix of around 2/3 training data to 1/3 pseudo-labeled data.

(chris) #10

That’s exactly how I did it, and I realised why it didn’t work when I got to that bit in the lesson 4 video. I was there in person, but I only remembered once I tried it.


My submission score (currently 1.15881) was also much worse than my val_loss of 0.1609. So far, I’ve only fine-tuned Keras’ VGG16 model with the Nadam optimizer and a sufficiently low learning rate. I haven’t yet applied dropout, ensembling, or pseudo-labels. Nor have I handled the class imbalance issue. Data augmentation, even with tiny increments, did not improve val_loss for me. A few other things I noticed:

  • I again tried Keras’ ResNet50 on this Kaggle competition but could not tame it enough to converge to a respectable validation score, despite trying lower and lower learning rates and other optimizers. VGG16 seems to give respectable results relatively quickly. I’ve not tried other achitectures like Inception, however.
  • Despite training on various low learning rates with pre-calculated inputs into VGG16’s fully connected layers, I actually got better results training on the entire model (while still freezing the base convolutional layers). Does anyone know why that could happen?
  • Note: My software configuration includes CUDA 8.0 with CuDNN 5.1; I created my training / validation split with Scikit Learn’s StratifiedKFold

Since this competition has images that are significantly higher than the 224x224 inputs into the pre-trained ImageNet architectures we’re familiar with (e.g., VGG16, ResNet), I can’t help but wonder whether we should pre-pend our model with a convolutional layer that accepts a large image size (e.g. 2048x2048) that outputs 224x224 images to VGG16. Is that a worthy approach? I can’t find a definitive answer on how pre-trained ImageNet architectures can be used with higher res images.

(Jeremy Howard) #12

Your last idea there is interesting - I haven’t seen that tried before. You may also find taking a few crops and averaging them could be a good way to handle it.

(vedshetty) #13

Fortune article on this competition - http://fortune.com/2016/11/14/deep-learning-artificial-intelligence-tuna-industry/


@jeremy I heard your mention of attention models during tonight’s class and I think that’s the hint I needed to handle high resolution images. I’m trying to do some reading on it right now. :slight_smile:

(Jeremy Howard) #15

Looking forward to hearing about how you go!


In my search for attention models, I came across Google DeepMind’s paper on a relatively new type of layer – the Spatial Transformation Layer (https://arxiv.org/pdf/1506.02025v3.pdf). I found someone’s Keras implementation on GitHub and successfully ran their sample notebook using it on cluttered MNIST data. I was amazed to see it auto-focus on the correct part of the image! I don’t know why this type of layer hasn’t become standard yet. It’s been published for over a year.

(Jeremy Howard) #17

I know of lots of people that have tried to use it, but no-one that’s successfully used it for their own real world data. I’m still excited about the idea - so it would be great if you could try it for something like the fisheries competition (where I think that focusing on the fish is important).

(sravya8) #18

Hey @jeff Do you mind sharing which keras implementation you used?


Googling for “spatial transformer network keras” led me to https://github.com/EderSantana/seya


I’ve tried the Keras implementation (https://github.com/EderSantana/seya) with both the fish image data set and the mammography dataset, but it didn’t work out. I used a single spatial transformer layer as the first layer before the rest of the CNN with input image size of 4096x4096 and lowered my batch size due to the increased memory usage. Training time was very slow (as might be expected), and I didn’t even get a validation accuracy above zero after 2 epochs on the mammography data set. Is there a code sample for a more established attention model that I should be using instead? R-CNNs perhaps?