Kaggle Fishery Competition Discussion

aifish · March 4, 2017, 8:18pm

For those interested in the Kaggle Fishery Competitions, I would like to start a thread for discussions.

I built a VGG16 based model inspired by what I learned in part 1 and got a reasonable result (top 15%). My questions are:

Problems I am facing:

The loss I got with submitted result against test dataset is much bigger than the loss I got during validation (4-5 times bigger). I am wondering how to explain that? Is it because our training/validation set and the test set are from different samples (i.e. from different sets of cameras and boats?).
I also tried the Inception V3 model, which is supposed to be better than the VGG. I tried two options.
2.1) - I used the dense layers I used for VGG, and put it on top of Inception V3 (that is, the Inception model without its dense layer)
2.2) - I used the original Inception V3 directly and retrained the last 3 blocks after ‘mixed7’ including the last dense layer.

Both of these approaches perform a lot worse than my VGG model after submission. Any ideas why this is the case?

@jpuderer @kishore_p_v @Even @karthik_k134 I saw that you are all interested in this competition. Do you have any insights? If I missed anybody else, please also feel free to jump in.

@jeremy can you offer some directions to explore further?

davecg · March 4, 2017, 10:49pm

How did you preprocess your InceptionV3 images? The pretrained model expects a different kind of input than VGG and ResNet.

shgidi · March 4, 2017, 10:54pm

Hi, my latest conclusion from this competition is that if you use part 1 course method, you do massive over fitting of the ships. in other words, you predict the ships and not the fish.
given that the second stage will include mostly new ships (according announcement in the forum) I believe that the current leader board is useless.
you can witness what the overfitting by yourself if you’ll try to take one ship (ships are quite correlated with image sizes) out as validation set, and see you get horrible results - logloss of around 3-4.

jeremy · March 4, 2017, 10:55pm

I’d suggest moving this discussion to Kaggle ‘The Nature Conservancy Fisheries Monitoring’ competition . That way you can benefit from the insights of all students, and visa versa!

It’s probably best if we keep this ‘part2’ category for discussions related (at least tangentially!) to part 2 of the course.

aifish · March 5, 2017, 4:41am

Thank you guys for sharing. I will move the discussion to the other thread as suggested by Jeremy.

jpuderer · March 6, 2017, 6:11pm

This! I’m also reasonably sure you get stuck in a the local minimum of “boat optimization”.

I think that clipping the fish based on a calculated bounding box (a la Lesson 7) would be the way to go.

It might also be interesting to see if something we learn in part 2 is applicable. In particular, I wonder if we could use something to do with GANs to “challenge” our model to “unlearn” boat identification (generate everything but fish, for example).