Memory leak in dataloader?

This previous forum post identifies this issue as ThreadPoolExecutor greedily pulling batches for each iteration of self.sampler into memory in Python 3.6 vs. pulling them in lazily in Python 3.5.

There are two workarounds:

  1. Set num_workers to 0, which then runs batches in a single thread. This resulted in the consumption of a max of 3GB of memory in the scenario above.
  2. Use the dataloader iterator from Pytorch as described here.

I’m continuing to research ways to make a permanent fix. Any ideas on how to do that or things to look into would be much appreciated.

https://github.com/fastai/fastai/blob/master/fastai/dataloader.py