[PR #483] Does prediction ignore batch size? Out of memory error for: probs = learn.predict(is_test = True) and loop { x, y = next(it); probs = learn.model(V(x)) }

gai · May 19, 2018, 12:43pm

I am working on a kaggle submission but I cannot actually predict the whole test.csv as I am getting an out of memory error for both learn.predict(is_test = True) and when trying to do it manually:

_next = True
c = 0
t_probs = []
it = iter(test_dl)
while _next:
x, y = next(it)
probs = learn.model(V(x))
t_probs.append(probs)
if x is None: _next = False
c += 1
if c % 100 == 0: print(c * 30)

I can see how it iterates over this loop about 500 times and then ends with a

RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

in the forward method. It seems like gpu memory isn’t being freed between forward calls

gai · May 19, 2018, 12:56pm

It seems I am missing a call to backward() so that the references to the graph will be freed. Could that be a bug in predict() as well?

sgugger · May 19, 2018, 4:57pm

I don’t know how big your test set is, but maybe get your predictions out of the gpu once they’re done by using:

probs = to_np(learn.model(V(x))

You should also put the model in eval mode with learn.eval().

gai · May 19, 2018, 6:04pm

Many thanks @sgugger!!! This version works now without OOM

learn.model.eval()
it = iter(test_dl)
while True:
    x, y = next(it)
    probs = to_np(learn.model(V(x)))

Calling predict, however, still runs out of GPU memory, but maybe it can’t really accomodate large datasets. My test set has about 500k rows, 300M of CSV.

gai · May 19, 2018, 6:48pm

I was able to get predict() working by moving the to_np() calls to after each batch instead of just once for the final result.

I’ve made a pull request: