RNNLearner get_preds() causes GPU memory to explode

EpiphanyMania · December 3, 2018, 6:19pm

I’m getting CUDA memory error on running learn.get_preds(ds_type=DatasetType.Test) of text_classifier_learner. I have around 100k rows in the test dataset. learn.validate() on validation dataset works fine and has constant 1.5gb vRAM memory used but while running learn.get_preds(ds_type=DatasetType.Valid), GPU memory gradually increases and errors out with CUDA no memory error.
Is there a workaround to apply predictions with get_preds for RNNLearner with ~100k rows or is it a limitation of the GPU card?
I’m guessing RNNLearner get_preds doesnot move the model to cpu? but learn.validate moves to cpu. Any insights would be helpful

P.S: Ignore the faulty assignment of learn.validate()

EpiphanyMania · December 3, 2018, 7:00pm

As a workaround, moved test_dl to cpu and model to cpu and ran get_preds

Seb · April 2, 2019, 3:27pm

I am getting a similar problem.
I trained a Unet, and when I run

learn.get_preds(ds_type=DatasetType.Test)

on my larger test images (shape = [3, 2848, 4256] ), I get a CUDA out of memory error about halfway through the process.

I am using batch_size = 1. So I don’t understand why the GPU memory doesn’t get cleared after each batch. Instead, the data seems to be accumulating until we run out of memory.

I will try running on CPU instead.

florobax · May 16, 2019, 1:44pm

Hi, I would recommend you to put your prediction loop inside a with torch.no_grad(): block. This did the trick for me, as it prevents pytorch from computing gradient when you are predicting on a huge dataset.