Memory Leak in the loss_batch function

Hey all,
I’ve subclassed the Trainer class and create a new function (which is nearly identical with the validate function in the actual Trainer class)

   def test_with_callbacks(self, dl=None, callbacks=None, metrics=None):
        "Validate on `dl` with potential `callbacks` and `metrics`."
        dl = ifnone(dl, self.data.test_dl)
        metrics = ifnone(metrics, self.metrics)
        cb_handler = CallbackHandler(self.callbacks + ifnone(callbacks, []), metrics)
        cb_handler.on_epoch_begin()
        val_metrics = validate(self.model, dl, None, cb_handler)
        cb_handler.on_epoch_end(val_metrics)
        return cb_handler.state_dict['last_metrics']

The only difference being that I pass None as the loss function as I don’t have ground truths for the testing dataset, and I use a custom callback class to record the outputs of the models (just a side note, is this the best way to save the results on a testing dataset, kind of feels weird that I would have to create a new function to do it, would having a loss_fn parameter be plausible in the validate function?)

Anyways, after a few batches of testing, I always get a CUDA OOM error, and I believe that the culprit is this line below in the loss_batch function

if not loss_func: return to_detach(out), yb[0].detach()

Even though the Tensors are detached they are still on the GPU, as a result they are kept in memory due to the reference held in the val_losses list

By doing:

if not loss_func: return to_detach(out.cpu()), yb[0].cpu().detach()

I fix the memory error

Make a PR here to fix it.

This is custom behavior you are requesting, and your change will break things in the library for other things. Since you are writing your own function, you should use loss_batch and put the tensors on the CPU as they arrive.