Moving learners off GPU

jc-denton · February 5, 2021, 7:43am

I trained a lot of models in a loop that I knew do not fit in GPU memory. So I called

cpu = torch.device('cpu')
learner.model.to(cpu)

But still got a CUDA out of memory error. Does anyone know what’s going on? You can also call learner.dls.to(cpu or gpu) but I would assume that it only moves batches to gpu for training / inference.

muellerzr · February 5, 2021, 7:53am

You should do learn.to('cpu'). It will recursively set the device

jc-denton · February 5, 2021, 10:44am

Thanks for the quick replay! I have to check this IIRC my learner (created using cnn_learner(...)) does not have a to(...) method, should it have this method?

I saw in the doc that you can specify the device when using load_learner(...) so maybe I just have to look at the fastai code there.

muellerzr · February 5, 2021, 12:31pm

Can you provide a stack trace? As yes, it should.

jc-denton · February 7, 2021, 1:12pm

Maybe something is wrong with my fastai installation at work (it’s also not the newest one). I tried it again with the newest version where I get:

> learn.to

<bound method Module.to of Sequential

So that would be the same as learn.model.to(...). What else does the learner keep in GPU memory? As a workaround I will simply save and delete the learner and then restore again when needed. Btw: Where in the fastai code is the method to bound to the model of the learner?

muellerzr · February 7, 2021, 5:56pm

Thought on this more, and you are correct.

If cpu is selected fastai will set the dls to cpu, which should in turn be the only thing you need as the model sets its device based on the DataLoaders. You can do this via:

learn.dls.cpu();

(It happens inplace)