I’m working on lesson 3, in particular working through my own model for MNIST. I notice that for a CNN with batchnorm + dropout + data augmentation, the notebook’s default output shows a training time of 14s per epoch, but I see 60s on my p2.xlarge instance (255s on the same machine using CPU only).
Is this simply because the course notebooks were trained on far superior hardware? If there’s a chance it’s something else, I’d love to know, since it’d make the difference between overnight training and a couple of hours in the background.