I spend about two weeks in this competition and learned a lot, my last score is 0.05051, place at 67, close to top 5%. The tools I used are dlib, keras and mxnet.
What I learned from this competition is:
1 : Ensemble may make your results worse
2 : Remember to record down the parameters you used, excel like editor is a nice tool for this
3 : Feed pseudo labels into the mini-batch with naive way do not work(I should finished lessons 4 before I gave it a rush even I am running out of time)
4 : Leverage pretrained model is much easier to get good results
5 : How to use dlib, keras and mxnet
6 : Read the post at forums, it may give you useful info
7 : Fast ai course is awesome, I should view them earlier(just finished lesson 4)
-------------Work approach--------------------
a : dlib
1 : split the data to 5 cross with augmentation(5 times), I did not figure out
which augmentation tricks work best, however, vertical augmentation looks like a bad choice
2 : extract features by resnet34 of dlib on the training data and test data, store them
3 : Predict the labels by different combinations of the k-cross models.
4 : Submit, score is 0.06266
5 : clip the value to 0.02, 0.98, this improve the score to 0.05688
6 : validate data with random crop might improve accuracy, but I have no time to try out
b : mxnet
I reentered this competition when I got 5 or 6 days left, so I am in a hurry, solution I tried on
mxnet and keras are less sophisticated than dlib
1 : Fine tune resnet34~200 on the dataset with augmentation, no k-cross validation,
did not figure out best why to augment the data.
2 : ensemble all of the results of the models, including the results of dlibs, this improve my
score to 0.05051
-------------Non work approach--------------------
1 : I trained different models by dlibs and ensemble them, but this give me worse results.The steps are
a : Extract augmented features by resnet34, store them
b : Train k-cross models with extracted features and different "top models"
c : ensemble the results
d : clip value to 0.02, 0.98
e : get worse results
--------------My views on the library(bias)------------------
1 : keras
pros : easiest to use, lots of nice examples out there
cons : hard to extend(I want to change the way the data feed into mini-batch), maybe it is
because I am not an expert of python yet.Learn a new language is very easy, but become an expert of it is another story.
2 : mxnet
pros : more pretrained models
cons : Documents and examples are not that good, some(many) examples are outdated.I cannot figure out how to find out the numbers of layers, freezing learning rate of base layers with correct solution yet(I implement them but not sure they are correct).
3 : dlib
pros : could work as a zero dependency lib, easy to port to different platforms, a library designed to solve real world problems, apps development rather than prototype nor academic use. Nice documents, examples, high quality source codes(this is what we called modern c++ looks like).
cons : Got one pretrained model(resnet34) only, small community, lack lots of of features in deep learning world. Since it is new, we can expect there will be more features add into it in the future.
ps : I may have bias on dlib because it is written by my favorite language–c++