First epoch slower than other epochs?

I have noticed that the first epoch is always significantly slower than subsequent epochs. It seems that it’s because in the first epoch there is a lot more disk activity. Is the framework cache the training data in memory? I have a lot of data which I don’t think fit in my 16GB of RAM that I have.

It might just be the operating system (Linux?) that does the cacheing

I have noticed this too, even when the training data is already on the GPU.

I think that CUDA itself sometimes benchmarks algorithm variants and chooses the fastest one. That could explain part of the speed up.

1 Like