I just read Kirill Panarin’s blog entry at https://towardsdatascience.com/dog-breed-classification-hands-on-approach-b5e4f88c333e. There Kirill, who currently ranks 106th with log loss 0.03, describes his classification algorithm.
Kirill used the Inception model architecture with two additional fully connected layers at the end. To achieve his classification result, he used
- Mini-batch size = 64
- Learning rate = 0.0001
- Epochs count = 5000
Our Resnet architecture is can run only very few epochs, or it will overfit. How come, that a network trained with 5000 epochs performs so well?