I am training a classifier on Quick Draw data subset (340,000 samples). I am using this line to get most confused classes:

```
interp.most_confused()
```

However, this command raises an error:

```
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-26-06171ec66e30> in <module>
----> 1 interp.most_confused()
~/code/fastai_v1/repo/fastai/vision/learner.py in most_confused(self, min_val)
117 def most_confused(self, min_val:int=1)->Collection[Tuple[str,str,int]]:
118 "Sorted descending list of largest non-diagonal entries of confusion matrix"
--> 119 cm = self.confusion_matrix()
120 np.fill_diagonal(cm, 0)
121 res = [(self.data.classes[i],self.data.classes[j],cm[i,j])
~/code/fastai_v1/repo/fastai/vision/learner.py in confusion_matrix(self)
92 "Confusion matrix as an `np.ndarray`."
93 x=torch.arange(0,self.data.c)
---> 94 cm = ((self.pred_class==x[:,None]) & (self.y_true==x[:,None,None])).sum(2)
95 return to_np(cm)
96
RuntimeError: $ Torch: not enough memory: you tried to allocate 36GB. Buy new RAM! at /opt/conda/conda-bld/pytorch-nightly_1539863931710/work/aten/src/TH/THGeneral.cpp:204
```

The reason is that this line creates a matrix with shape `(340, 340000)`

due to broadcasting:

```
self.pred_class==x[:,None]
```

And, then it creates *another* matrics with the same size:

```
self.y_true==x[:,None,None]
```

Therefore, my question is, could we somehow compute this thing iteratively instead of broadcasting? Probably the current version of `ClassificationInterpretation`

class is not too scalable?

And, as a general question, how do you usually compute metrics for huge datasets? Or is it unreasonable to carry out such kind of analysis for big datasets?