Hey all,
I’ve subclassed the Trainer
class and create a new function (which is nearly identical with the validate
function in the actual Trainer
class)
def test_with_callbacks(self, dl=None, callbacks=None, metrics=None):
"Validate on `dl` with potential `callbacks` and `metrics`."
dl = ifnone(dl, self.data.test_dl)
metrics = ifnone(metrics, self.metrics)
cb_handler = CallbackHandler(self.callbacks + ifnone(callbacks, []), metrics)
cb_handler.on_epoch_begin()
val_metrics = validate(self.model, dl, None, cb_handler)
cb_handler.on_epoch_end(val_metrics)
return cb_handler.state_dict['last_metrics']
The only difference being that I pass None
as the loss function as I don’t have ground truths for the testing dataset, and I use a custom callback class to record the outputs of the models (just a side note, is this the best way to save the results on a testing dataset, kind of feels weird that I would have to create a new function to do it, would having a loss_fn
parameter be plausible in the validate
function?)
Anyways, after a few batches of testing, I always get a CUDA OOM error, and I believe that the culprit is this line below in the loss_batch
function
if not loss_func: return to_detach(out), yb[0].detach()
Even though the Tensors are detached they are still on the GPU, as a result they are kept in memory due to the reference held in the val_losses
list
By doing:
if not loss_func: return to_detach(out.cpu()), yb[0].cpu().detach()
I fix the memory error
Make a PR here to fix it.