I reached 6% on the public leader board. I used VGG16, effectively replacing the FC layers and training them from scratch with batch normalization. I also fine tuned all the layers.
What worked
The process that @jeremy describes is golden
This is the key - the start simple, overfit, see what you can do with data augmentation, add regularization if needed. I think this approach / skillset is what Andrew Ng is referencing in his Nuts and Bolts of Applying Deep Learning presentation. He mentions that controlling bias / variance is something that even experienced practitioners struggle with and something that makes the real difference. To me it certainly felt like magic in how effective that approach was.
Data augmentation is cheating
With setting the FC layers to trainable, my model was quickly overfitting. However, I ended up with what felt to me like a lot of data augmentation, and it turned out I didn’t need additional regularization! (maybe could use a little bit but didn’t have time to add it). Experiments with small sample sizes were super cool - with 500 train and 250 validation images I was able to observe how data augmentation / regularization literally gives the model the ability to learn things that generalize.
Batchnorm is a performance enhancing drug for neural nets
I don’t think I ever want to build a model without it!
What didn’t work
Complexity everywhere
I ended up with some code doing something over here, then I had to reconstruct the model so I ran those bunch of cells, then I had this method in utils.py that I created and soon enough I sort of knew what I was doing in the moment but it wasn’t necessarily well structured. This approach doesn’t scale and will not get me ever a good solution. Seems a big part of getting good results is engineering / tinkering and with so many things flying around the overhead is unmanageable. Same with naming saved weights and knowing what is where, etc. With the next kaggle competition I need to work out a cleaner pipeline - maybe start using collapsable cells in jupyter notebook, moving more things out into separate scripts, and somehow preserving what worked and not allowing it get lost in the see of trying out things. I think this and model ensembling would be the key to improving my results, even above learning new and more powerful techniques / architectures.
Horrible zombie brain and tiny improvements
’Ooo I wonder what this epoch will bring… mhmm its only 6 minutes… let me browse those pictures of funny cats or ponder what the progress bar is doing while I wait. Ah what the heck, let me run it for another epoch, maybe this will help.’ Horrible, horrible, horrible time wasted. What’s even worse is chasing another 0.01 decrease in validation loss via running the training overnight with tiny learning rate… This is like a highway to loss of productivity and overfitting the validation set by trying to hard and yet it did help a bit with public leaderboard standing… Very dissatisfying.
Pseudo-labeling not so hot with a lot of data augmentation
I think at the moment I wanted to add pseudo-labeling there was not a whole lot of capacity in my model to spare. For one reason or another I was not seeing any improvements on the validation set. I think this reflects more on my particular set up at the time than pseudo labeling as such and should I have more time in the future I will definitely try to apply it - this needs more experimenting on my side.
Summary
Deeplearning is not what I expected. It is not so much about sitting in a corner reading a math book but much, much more about figuring out how to configure an AWS instance, how to use shell / python to move files around, etc. I also feel that moving from top 6% to top 3% will be a lot harder but the quickest way to get there is not by studying more math (which I would love to do anyhow) but by building models and experimenting. I would like to learn more math and know how to do it but at this point in time I only have very little time (relatively) I can give to deep learning related activities, so need* to spend my time bucks on where they can bring most value
* yeah yeah @radek we all know you are only saying this but we know your mind is already scheming how to learn more math