Big jump in GPU RAM on custom learner

I am trying to implement multi-task text classifier using fastai (several linear classifier heads for each category on top of NLP backbone). This now seem to work at training. My issue is GPU memory consumption when using learn.predict (https://github.com/fastai/fastai/blob/master/fastai/basic_train.py#L350 ) function – my code essentially looks as follows predicted = [learn.predict(e[0]) for e in data.test_ds]: before prediction I have 6G free ram which goes to zero in a second.

I know it’s a very general question without specifying all the code, but actually the learner function has non been changed that much. At this point any broad suggestions would be most appretoated:

  1. is there smth in learner / pred_batch / dowsnream functions that lead to copying of tensors

  2. when I run

for obj in gc.get_objects():

    try:

        if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):

             print(type(obj), obj.size())

    except: pass

it returns a around 17000 empty tensors – how this could be explained / debugged

  1. any other debusgging suggestions?

Thanks!

=== Software === 
python        : 3.6.8
fastai        : 1.0.46
fastprogress  : 0.1.20
torch         : 1.0.0
nvidia driver : 396.26
torch cuda    : 9.0.176 / is available
torch cudnn   : 7401 / is enabled
=== Hardware === 
p2.xlarge AWS instance 
nvidia gpus   : 1
torch devices : 1
gpu0      : 11441MB | Tesla K80