Iterative computations of confusion matrix

vitaliy · October 30, 2018, 11:52pm

Hi,

when experimenting with the code, I’ve got out of memory exception when calling plot_confusion_matrix and most_confused:

RuntimeError: $ Torch: not enough memory: you tried to allocate XXGB. Buy new RAM! at /opt/conda/conda-bld/pytorch-nightly_1539863931710/work/aten/src/TH/THGeneral.cpp:204

After some debugging, I discovered that this error is caused by pytorch when calling .sum(2) here:

~/code/fastai_v1/repo/fastai/vision/learner.py in confusion_matrix(self)
     92         "Confusion matrix as an `np.ndarray`."
     93         x=torch.arange(0,self.data.c)
---> 94         cm = ((self.pred_class==x[:,None]) & (self.y_true==x[:,None,None])).sum(2)
     95         return to_np(cm)

I don’t know why this is happening, I already created an issue in pytorch repo: https://github.com/pytorch/pytorch/issues/13296

but the obvious fix is to perform mathematical computations in slices. for this reason I am suggesting to add new parameter slice_size to both functions above to be able to overcome memory problems.

Here is the commit on my fork: https://github.com/vborees/fastai-1/commit/6ebcbb06e2160f8ebf086f9b415507fc36682235

all comments are welcome

devforfu · October 31, 2018, 1:25am

Hi, cool!

I would only advise modifying your docstrings a bit. As I know, fastai codebase doesn’t document function args in the docstings. See example:

def get_preds(model:nn.Module, dl:DataLoader, pbar:Optional[PBar]=None, cb_handler:Optional[CallbackHandler]=None,
              activ:nn.Module=None, loss_func:OptLossFunc=None, n_batch:Optional[int]=None) -> List[Tensor]:
    "Tuple of predictions and targets, and optional losses (if `loss_func`) using `dl`, max batches `n_batch`."
    res = [torch.cat(o).cpu() for o in
           zip(*validate(model, dl, cb_handler=cb_handler, pbar=pbar, average=False, n_batch=n_batch))]
    if loss_func is not None: res.append(calc_loss(res[0], res[1], loss_func))
    if activ is not None: res[0] = activ(res[0])
    return res

Also, probably you would like to write a couple of tests to test new slice_size argument. Nevertheless, I guess iteratrive computation is a great thing to have!

vitaliy · October 31, 2018, 3:26am

Thank you very much for your review, Ilia.
I’ve created a thread about docstrings Docstring conventions: what docstring format should I use? since I am not sure if I need to document it somewhere.

In the meantime I will work on adding meaningful tests for a new parameter.

devforfu · October 31, 2018, 3:31am

Sure, not a problem! I am also not a big expert talking about fastai development topics

A couple of useful links to help:

vitaliy · October 31, 2018, 3:33am

Oh, I think those links are very useful, thank you! I might remove my thread if I will find all the answers there!