Expected training times

I’m working on lesson 3, in particular working through my own model for MNIST. I notice that for a CNN with batchnorm + dropout + data augmentation, the notebook’s default output shows a training time of 14s per epoch, but I see 60s on my p2.xlarge instance (255s on the same machine using CPU only).

Is this simply because the course notebooks were trained on far superior hardware? If there’s a chance it’s something else, I’d love to know, since it’d make the difference between overnight training and a couple of hours in the background.

It depends also on how many CUDA cores your GPU has, also you can increase speed adjusting batch size to use GPU RAM effectively.

Finding a good trade off between Batch size, architecture and data augmentations leads to reasonable training time. A good GPU memory management leads to better performance.