Dog breed classification: Better results with more epochs?

I just read Kirill Panarin’s blog entry at There Kirill, who currently ranks 106th with log loss 0.03, describes his classification algorithm.

Kirill used the Inception model architecture with two additional fully connected layers at the end. To achieve his classification result, he used

  • Mini-batch size = 64
  • Learning rate = 0.0001
  • Epochs count = 5000

Our Resnet architecture is can run only very few epochs, or it will overfit. How come, that a network trained with 5000 epochs performs so well?

I am quite sure they are using ‘epochs’ to mean ‘iterations’ or minibatches.

5000 minibatches x 64 images per batch = training on 320 000

If we divide 320 000 by 20 000 (that is the rough numbers of images in the train set? not sure) we get 16 passes through the entire dataset (what we would call epochs in the course).


Thank you Radek, that makes sense :slight_smile: