Ok, here is a simple Gist where I am trying to reproduce the leakage using a smaller dataset:
Also, here is a link to the repository with the notebook:
It doesn’t allow to reproduce the issue exactly. However, as I can see, it shows a gradual increase of consumed memory during a single epoch (but not between epochs).