CPU RAM getting full with a custom ImageList and model

m.mansour · March 26, 2019, 11:35am

Hello all,
I have created a custom ImageList class with only a very simple edit to the open function:

def open(self, fn):
    return torch.load(fn).float()

I am loading tensors saved in pickle files. These tensors have a shape of (20, 100, 100)
My problem is that once I train my model, RAM usage -regular RAM not GPU RAM- keeps growing up until it almost completely fills the whole memory.

I tried decreasing the batch size from 64 to 32, 16, and 8. That didn’t change anything.

It is worth noting that GPU RAM usage is not changing at all; nevertheless, GPU utilization and temperature do go up, at least according to nvidia-smi. Also both of these statements return True:

torch.cuda.is_available()
torch.backends.cudnn.enabled

It seems that torch.load() loads the tensors into RAM rather than GPU RAM; moreover, it also seems that it isn’t unloading the mini-batches once they are done processing.

Not sure if it matters, but I am using a simple nn.Sequential model made out of conv_layers and res_blocks. I tried tailing my model with .cuda() but that didn’t change anything.

I also tried replacing:

return torch.load(fn)

With:

torch.load(fn, map_location=lambda storage, loc: storage.cuda(0))

The normalize() function then threw a RunTimeError. Seems like the data loader was unable to load a batch. The error was raised in this funciton inside basic_data.py:

def __iter__(self):
    "Process and returns items from `DataLoader`."
    for b in self.dl: yield self.proc_batch(b)

Any idea what’s going on ?
Thanks in advance,