Hello all,
I have created a custom ImageList class with only a very simple edit to the open function:
def open(self, fn): return torch.load(fn).float()
I am loading tensors saved in pickle files. These tensors have a shape of (20, 100, 100)
My problem is that once I train my model, RAM usage -regular RAM not GPU RAM- keeps growing up until it almost completely fills the whole memory.
I tried decreasing the batch size from 64 to 32, 16, and 8. That didn’t change anything.
It is worth noting that GPU RAM usage is not changing at all; nevertheless, GPU utilization and temperature do go up, at least according to nvidia-smi
. Also both of these statements return True
:
torch.cuda.is_available()
torch.backends.cudnn.enabled
It seems that torch.load()
loads the tensors into RAM rather than GPU RAM; moreover, it also seems that it isn’t unloading the mini-batches once they are done processing.
Not sure if it matters, but I am using a simple nn.Sequential model made out of conv_layers and res_blocks. I tried tailing my model with .cuda()
but that didn’t change anything.
I also tried replacing:
return torch.load(fn)
With:
torch.load(fn, map_location=lambda storage, loc: storage.cuda(0))
The normalize()
function then threw a RunTimeError. Seems like the data loader was unable to load a batch. The error was raised in this funciton inside basic_data.py:
def __iter__(self): "Process and returns items from `DataLoader`." for b in self.dl: yield self.proc_batch(b)
Any idea what’s going on ?
Thanks in advance,