I have noticed that the first epoch is always significantly slower than subsequent epochs. It seems that it’s because in the first epoch there is a lot more disk activity. Is the framework cache the training data in memory? I have a lot of data which I don’t think fit in my 16GB of RAM that I have.
It might just be the operating system (Linux?) that does the cacheing
I have noticed this too, even when the training data is already on the GPU.
I think that CUDA itself sometimes benchmarks algorithm variants and chooses the fastest one. That could explain part of the speed up.
1 Like