Help with understanding vocab

o2magic · December 10, 2020, 2:24pm

Hello, I’m using a plain pytorch Dataset with fastai2, training is ok but when I try to use ClassificationInterpretation it complains the Dataset doesn’t have a vocab defined, I am having a hard time understanding how to define it properly. I thought a simple vocab = ['a','b'] on the Dataset class would be enough but the ClassificationInterpretation constructor only keeps the last element of the vocab list:

github.com

fastai/fastai/blob/a07f271ac6a03cd14ff7f8c031c38527e5b238ed/fastai/interpret.py#L57


        #its None means that a batch knows how to show itself as a whole, so we pass x, x1
        #else: show_results(x, x1, its, ctxs=ctxs, max_n=max_n, **kwargs)
# Cell
class ClassificationInterpretation(Interpretation):
    "Interpretation methods for classification models."
    def __init__(self, dl, inputs, preds, targs, decoded, losses):
        super().__init__(dl, inputs, preds, targs, decoded, losses)
        self.vocab = self.dl.vocab
        if is_listy(self.vocab): self.vocab = self.vocab[-1]
    def confusion_matrix(self):
        "Confusion matrix as an `np.ndarray`."
        x = torch.arange(0, len(self.vocab))
        d,t = flatten_check(self.decoded, self.targs)
        cm = ((d==x[:,None]) & (t==x[:,None,None])).long().sum(2)
        return to_np(cm)
    def plot_confusion_matrix(self, normalize=False, title='Confusion matrix', cmap="Blues", norm_dec=2,
                              plot_txt=True, **kwargs):

o2magic · May 26, 2021, 2:52pm

bump, can anybody help ?

I read the mid-level api tutorial: Tutorial - Assemble the data on the pets dataset | fastai, but there it also defines self.vocab as a simple list of labels, this won’t work with ClassificationInterpretation, which only keeps the last element of vocab.

florianl · May 26, 2021, 6:46pm

hm that could be a bug? I think the idea behind the code is to support text vocabs - they look like this:

dls.vocab[[<text vocab>], [<classes>]]

# dls.vocab[-1] will make sure to use the actual classes instead the text vocab

o2magic · May 27, 2021, 3:08pm

thanks Florian,
I tried setting the vocab to [[], [‘a’,‘b’]], and plot_confusion_matrix() works, but plot_top_losses() gives me an error still.

but I think I figured out how this works. Categorize(vocab=[‘a’,‘b’]).vocab, generates a CategoryMap, it looks just like a list [‘a’,‘b’] when printed but its a different class, so I guess this makes a big difference for the fastai internals.

using that as the dataset vocab somehow makes it all work as expected !!