Memory usage peaks after second epoch

I remember one lesson says that memory usage usually peaks after the first epoch. In my case, the first epoch uses 6G memory and all epochs after the first uses 8G memory. Is this normal?

This is not expected. After the first operation on GPU you should see a rise in memory usage as PyTorch has to setup some cache and temporary data. But after that the memory usage should decrease. Or if you did a GPU operation before starting your training than you should not see a peek like the one you provided the example of.

The memory usage may be due to the fact you are holding some variables.

Which pytorch version do you use ? I found in the latest version, Pytorch did a better job on memory handling

I’m using 1.1. Which version are you using?

I find something super weird. My custom metric function (mIoU), which is wrapped by a fastai Callback uses about 2G memory. The weird part is my loss function (jaccard loss) is basically the same as my metric function but the loss function doesn’t use that much memory. The only possible reason I can think of is that wrapping my metric function with a fastai Callback leads to more memory usage.