Kaggle 'The Nature Conservancy Fisheries Monitoring' competition

sakiran · February 27, 2017, 7:44am

As far as I understand even that is not possible correct? We can just give parameters as to how much we would want to zoom, shrink etc. Since there is this variability, we cannot use/precompute the boxes.

jeff · February 27, 2017, 1:59pm

But, we can inspect the new values after they have been generated at random.

jeremy · February 27, 2017, 5:56pm

Probably better to crop/resize in advance using the bounding boxes, rather than using data augmentation.

jeremy · February 27, 2017, 5:57pm

Sure! (I showed lots of techniques in lesson 7, but didn’t combine them all together since it was an ongoing competition and that seemed unfair to the other competitors.)

twairball · March 2, 2017, 5:15am

My baseline score using just VGG scored 1.18. Adding data-augmentation or pseudo labeling makes my test score significantly worse, up to 1.5 in some cases.

I suspect that data-augmentation isn’t helping if my model is already overfitting to the training/val dataset, but am surprised that pseudo labeling isn’t helping.

What are some things people tried?

ostegm · March 2, 2017, 9:38pm

One thing I found to be helpful was applying augmentation at test time. Something like:

#pseudo-code
create a final_preds container
for a few loops:
    generate test data (sometimes with aug sometimes without)
    generate predicted class probabilities
    add predictions to final preds container

divide final_preds by number of loops

This achieves an average over multiple augmented test images.

JRicon · March 2, 2017, 11:15pm

Just submitted my first entry. A simple baseline model with a Dense(512) and then soft max, with batchnorms in both(No augmentation, dropout, etc yet) . Ranked in the top 53%, with a loss of 1.25. Now, to test with a CNN. It surprises me that with something so simple one can get that high

twairball · March 3, 2017, 1:52am

If I understand this correctly, you are doing?

Expand test set with data augmentation
Pseudo labeling on expanded test set
Ensemble results

That’s a cool idea to try and smoothen out any test-set specific differences vs. train set.

How was your score on leaderboard ?

JRicon · March 4, 2017, 5:41pm

Oddly enough, I tried training a convnet on the dataset, but failed to outperform my simple two-layer non-conv neural net. I learned in the process that dropout is better for the dense layers rather than the convolutional layers, and that it is better to use dropout in the later convlayers, not in the early ones, as the features in the first layers will be more generic. I’m now trying with VGG16bn, but even after a learning schedule involving over 20 iterations, and ensembling 4 Vggs, I cannot beat it. In fact, my score is substantially higher! I’m getting around 1.7 with the ensemble. The individual components get a validation accuracy of 0.916.

What could I try next? Trying with data augmentation, I get accuracies around 0.8 and validation accuracies as described above.
I thought that perhaps, given that there isn’t that much data in each individual category, data augmentation isn’t getting good batches for me, so to speak. So I disabled data augmentation and trained an ensemble of 2 models with the same schedule as above. Doing this, my models overfit, accuracies rise to 0.98, and validation accuracies raise to 0.95. I don’t have more submissions remaining today, so I’ll see what happens tomorrow when I submit this attempt.

gist.github.com

https://gist.github.com/artirj/254d8865bb70664244f1b3111d7518d4

Fisheries.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "# Fisheries kaggle competition"

This file has been truncated. show original

maciej · March 4, 2017, 8:56pm

Try cliping, for a kaggle score it helps a lot, but unfortunetlly has nothing to do with upgrading the “real life” score

shgidi · March 4, 2017, 11:05pm

Hi, my latest conclusion from this competition is that if you use the suggested above methods, you do massive over fitting of the ships. in other words, you predict the ships and not the fish.
given that the second stage will include mostly new ships (according announcement in the forum) I believe that the current leader board is useless.
you can witness what the overfitting by yourself if you’ll try to take one ship (ships are quite correlated with image sizes) out as validation set, and see you get horrible results - logloss of around 3-4.

tham · March 5, 2017, 3:18am

After reading this post and the other post on kaggle forum, looks like many of the competitors train on the whole image, is this approach able to classify fish on the boat? It sounds more like “boat classification” rather than “fish classification” to me.

I have overfitting problem too, imbalance + small data are really hard to deal with.

twairball · March 7, 2017, 5:47am

Thinking about “boat classification”, it might be worth looking at confusion matrix for just fish vs. no-fish classes, to see how good/bad the model is at finding fish at all.

davecg · March 7, 2017, 1:03pm

Think my models were usually decent at that.

Biggest sources of confusion were different types of tuna and class imbalance IIRC.

karthik_k314 · March 9, 2017, 9:42pm

There’s a lot of talk in the forums about using a localization network like YOLO. I wanted to understand here what’s the primary difference between using a localization network using the bounding boxes approach in Lesson 7. Both yield the same result right? One is human-made and the other is unsupervised learning.

Thanks!

twairball · March 10, 2017, 1:59pm

I got my biggest improvement today using an ensemble approach:

Training a number of models with data-augmentation on the training data
Also apply data-augmentation on test dataset
get predictions using each model and average the predictions for each test image.

This got me to 1.03 on LB, from baseline 1.18

Feels awesome that it works, but the ensembling part is unsatisfying. Anyone else feel that way?

Credit to this guy for his notebooks:

jeremy · March 10, 2017, 5:25pm

Right - in general if you have bounding boxes provided, it’s best to use them! Something like YOLO is only needed if you don’t have them, and can’t be as accurate (since it has less information to use)

torkku · March 10, 2017, 6:31pm

@twairball Did you use VGG16 for any of your models? What was the training and validation set accuracy?

I’ve gotten my best result (1.06) by just finetuning the last two dense layer, no data augmentation, pseudo-labeling nor any other tricks. I was overfitting training and validation loss pretty bad at 0.03/0.16 …

My next plans involve concentrating on the fish more, but I’m not yet sure how to really do it…

This however (and of course Jeremy + Rachel’s lessons and all the great information on these forums!!) has given me some ideas of how to proceed:

twairball · March 11, 2017, 2:34am

@torkku I used vgg16bn for these ensembles.

I trained with 512 dense nodes in the dense layers with lr=1e-4 to around 20 epochs each; from my testing with single models this seemed to get best baseline result. With data augmentation my training losses were always higher than my validation loss; val loss was around 0.15-0.20 range.

I’ve also tried training single Resnet50 model, this model doesn’t have many dense layers like with VGG, however the results were worse than VGG models.

I would be keen to see how you are fine tuning the dense layers if you care to share?

tham · March 11, 2017, 5:34am

I find out the hardest part of this competition is, the results on the validation set almost means nothing, it do not reflect the true situation when you apply it on the test set. Data itself are leaky because there are many images similar to each other.
I guess the best way to solve this problem is gather more data(but this is hard to do), do anyone have better suggestions?

If we already use pretrained network like vgg or resnet trained on imagenet, is it meaningful for us to reuse the fish images of imagenet as extra data set?Thanks