when experimenting with the code, I’ve got out of memory exception when calling plot_confusion_matrix and most_confused:
RuntimeError: $ Torch: not enough memory: you tried to allocate XXGB. Buy new RAM! at /opt/conda/conda-bld/pytorch-nightly_1539863931710/work/aten/src/TH/THGeneral.cpp:204
After some debugging, I discovered that this error is caused by pytorch when calling .sum(2) here:
~/code/fastai_v1/repo/fastai/vision/learner.py in confusion_matrix(self)
92 "Confusion matrix as an `np.ndarray`."
---> 94 cm = ((self.pred_class==x[:,None]) & (self.y_true==x[:,None,None])).sum(2)
95 return to_np(cm)
I don’t know why this is happening, I already created an issue in pytorch repo: https://github.com/pytorch/pytorch/issues/13296
but the obvious fix is to perform mathematical computations in slices. for this reason I am suggesting to add new parameter slice_size to both functions above to be able to overcome memory problems.
Here is the commit on my fork: https://github.com/vborees/fastai-1/commit/6ebcbb06e2160f8ebf086f9b415507fc36682235
all comments are welcome
I would only advise modifying your docstrings a bit. As I know,
fastai codebase doesn’t document function args in the docstings. See example:
def get_preds(model:nn.Module, dl:DataLoader, pbar:Optional[PBar]=None, cb_handler:Optional[CallbackHandler]=None,
activ:nn.Module=None, loss_func:OptLossFunc=None, n_batch:Optional[int]=None) -> List[Tensor]:
"Tuple of predictions and targets, and optional losses (if `loss_func`) using `dl`, max batches `n_batch`."
res = [torch.cat(o).cpu() for o in
zip(*validate(model, dl, cb_handler=cb_handler, pbar=pbar, average=False, n_batch=n_batch))]
if loss_func is not None: res.append(calc_loss(res, res, loss_func))
if activ is not None: res = activ(res)
Also, probably you would like to write a couple of tests to test new
slice_size argument. Nevertheless, I guess iteratrive computation is a great thing to have!
Thank you very much for your review, Ilia.
I’ve created a thread about docstrings Docstring conventions: what docstring format should I use? since I am not sure if I need to document it somewhere.
In the meantime I will work on adding meaningful tests for a new parameter.
Sure, not a problem! I am also not a big expert talking about
fastai development topics
A couple of useful links to help:
Oh, I think those links are very useful, thank you! I might remove my thread if I will find all the answers there!