Understanding Fastai Transforms

I’m struggling to understand the fastai transforms here in the book to be able to decode the output of the model and show the result. I’ve also looked at the siamese

I understand the examples but there are some parts that I think is missing, and its how do decode the data.

The example shows a transformer that have an encoder and decoder

  • encoder: filepath → image-> x input
    filepath → label → number ->y:target

  • Decoder: number->label

We also have a model:

  • image->model->one-hot encoded output.

My questions are:

  1. Where do I put the decoder (or where is it currently located) to convert from onehot hot (model ouptut) → number ? to be able to show the output?
  2. The show_results in the siamese-tutorial have some imputs that I do not understand:
    def show_results(x:SiameseImage, y, samples, outs, ctxs
    what are “samples” since we already have the image-pairs and y. but we need to have the predictions… is that named samples? what exactly is ctxs?? is it a list of figures?


Hi Daniel,

  1. The decoder is passed as a method to your Transform

Let me walk you through how I understand it using code. Let’s use the one-hot encode example you used.

class OneHotTransform(Transform):
    def __init__(self, len): self.len = len

    def encodes(self, o:int): 
        one_hot = np.zeros((self.len))
        one_hot[o] = 1
        return one_hot

    def decodes(self, o: tensor): 
        return torch.argmax(o).item()

So as you can see this OneHotTransform encodes an int into a one hot encoded array, and decodes a tensor into an int back again.

  1. Regarding your question about samples, it is a bit convoluted but let’s walk through the code. When you call learn.show_results() this is what goes one behind the scenes
def show_results(self, ds_idx=1, dl=None, max_n=9, shuffle=True, **kwargs):
        if dl is None: dl = self.dls[ds_idx].new(shuffle=shuffle)
        b = dl.one_batch()
        _,_,preds = self.get_preds(dl=[b], with_decoded=True)
        dl.show_results(b, preds, max_n=max_n, **kwargs)

So what happened here is that the learn object go prediction for a random batch of the dls then passed it the dls.show_results() method. So we need to take a look into it.

def show_results(self, b, out, max_n=9, ctxs=None, show=True, **kwargs):
        x,y,its = self.show_batch(b, max_n=max_n, show=False)
        b_out = type(b)(b[:self.n_inp] + (tuple(out) if is_listy(out) else (out,)))
        x1,y1,outs = self.show_batch(b_out, max_n=max_n, show=False)
        res = (x,x1,None,None) if its is None else (x, y, its, outs.itemgot(slice(self.n_inp,None)))
        if not show: return res
        show_results(*res, ctxs=ctxs, max_n=max_n, **kwargs)

Let’s walk through this line by line

  1. First the dls used it’s show_batch method to get this batch’s
    a. x which is an example of decoded input
    b. y which is an example of decoded output
    c. its which is a decoded batch including input and output
  2. b_out = type(b)(b[:self.n_inp] + (tuple(out) if is_listy(out) else (out,))) This line merges one encoded input together with the encoded predictions
  3. x1,y1,outs = self.show_batch(b_out, max_n=max_n, show=False) This line does the same as the first line except it outputs
    a. x1 which is an example of decoded input
    b. y1 which is an example of decoded predictions
    c. outs which is decoded batch containing input and predictions
  4. res = (x,x1,None,None) if its is None else (x, y, its, outs.itemgot(slice(self.n_inp,None))) And this is the line that demystifies your question about samples. In some cases, the samples which in this function is labeled its and outs sometimes isn’t returned by self.show_batch function, for example, in the siamese tutorial, since the dls wasn’t created with a DataBlock, so it doesn’t no exactly how to decode a batch.

When samples and outs aren’t provided, then x and y provided will be used with show_results without showing any prediction or outputs, that’s why x and x1 and passed to show_results if there are no samples and outs provided.

And you are correct with your guess about ctxs. They are a list of axes objects.

It’s a bit convoluted to think about, but I think it was designed this way to avoid any breaking if custom data pipeline were used.


Thanks for your answer but I might not have explained my question enough or I don’t understand your answer. Please correct me if I have miss-understood something.

  1. The OneHotTransform you show is straight forward.
    But I cant find where that transform used in the siamese-tutorial

The transforms used converts the labels to numbers by using the “CategoryBlock” and converts labels to numbers.

The Model is the method to converts the image to a one-hot encoded vector (the head has 2 output features). So, the decoder part should logically be located in the model (but it doesn’t) to convert from onehot back to numbers.

so, when I would like to show the predictions I need to convert back from onehot-encoded output to a number but I can’t see that the OneHotTransform is used anyware. So, how/where is the decoding calculated in that example?

  1. Your walk thru about the show_results is good but it raises more questions :slight_smile:
    Is there is a good tutorial somewhere that explains the details better than the book and official library documentation?

best regards/
Daniel Grafström


I misunderstood your first question. So you wanted to know where in the pipeline is the output encoded and decoded, so let’s take a look into what the CategoryBlock is actually doing.

def CategoryBlock(
    vocab:(list, pd.Series)=None, # List of unique class names
    sort:bool=True, # Sort the classes alphabetically
    add_na:bool=False, # Add `#na#` to `vocab`
    "`TransformBlock` for single-label categorical targets"
    return TransformBlock(type_tfms=Categorize(vocab=vocab, sort=sort, add_na=add_na))

So when using a CategoryBlock it returns a generic TransformBlock initialized with a specific Transform, just like the OneHotTransform I used in my previous reply.

In that case this Transform is Categorize, which is initialized with the vocabulary to enable encoding and decoding, and a bunch of other arguments. This is the Transform responsible for encoding and decoding in your pipeline.

class Categorize(DisplayedTransform):
    "Reversible transform of category string to `vocab` id"
    def __init__(self, vocab=None, sort=True, add_na=False):
        if vocab is not None: vocab = CategoryMap(vocab, sort=sort, add_na=add_na)

    def setups(self, dsets):
        if self.vocab is None and dsets is not None: self.vocab = CategoryMap(dsets, sort=self.sort, add_na=self.add_na)
        self.c = len(self.vocab)

    def encodes(self, o):
            return TensorCategory(self.vocab.o2i[o])
        except KeyError as e:
            raise KeyError(f"Label '{o}' was not included in the training dataset") from e
    def decodes(self, o): return Category      (self.vocab    [o])

Let’s dissect this class.


This line initializes the default loss function for the kind of pipeline you are using, to enable the use of generic vision_learner for example where you aren’t required to pass a loss_func.

def __init__(self, vocab=None, sort=True, add_na=False):
       if vocab is not None: vocab = CategoryMap(vocab, sort=sort, add_na=add_na)

This part initializes a CategoryMap which is a helper object that helps with mapping from decoded to encoded and reverse mapping from encoded to decoded (We don’t need to look into it if we just get the general idea of what it does), and it also stores the other arguments as attributes using store_attr()

def setups(self, dsets):
        if self.vocab is None and dsets is not None: self.vocab = CategoryMap(dsets, sort=self.sort, add_na=self.add_na)
        self.c = len(self.vocab)

We only need to worry about this if you didn’t provide vocab to the CategoryBlock while calling in the first place, so that’s why it uses the dsets to create it’s own vocab mapping.

def encodes(self, o):
            return TensorCategory(self.vocab.o2i[o])
        except KeyError as e:
            raise KeyError(f"Label '{o}' was not included in the training dataset") from e
    def decodes(self, o): return Category(self.vocab[o])

And this is the part the Transform provides the way to encode and decode, and you can see that in case of encoding it uses the CategoryMap object to convert the input into a TensorCategory, and in that case it’s one hot encoded, and for decoding it does the opposite.

The way I understand fastai handles a DataBlock pipeline for inputs and targets in general is the following:

  1. First you provide a way to get the inputs and targets through get_x and get_y function, which in this case they are in their raw form.
  2. Then the Blocks your provide use their initialized Transform to encode your inputs and targets, which in the case of the target in the siamese tutorial it is encoded from 0 or 1 into a one hot encoded version using Categorize
  3. Then the outputs of the previous steps are passed into item_tfms, then they are collated together to form batches followed by being passed into batch_tfms.

Regarding your second question

I don’t know if there is any good tutorial around to be honest, but I think that getting your hands dirty with the documentation and the source code is essential if you want to gain a deep understanding of the inner workings of fastai.

I hope I’ve answered your question, and don’t hesitate to ask about anything I left not clarified.

1 Like

Thanks again but I still think there is a missunderstanding :slight_smile:

When I input an image to the model, it then outputs a one-hot encoded prediction that the CrossEntropyLossFlat uses.

The predictions is also used to show the result between the predictions and the expected output. and also by the Interpretation class to be able to eg calculate and show a confusion matrix.

We can look at the code in the show_result function below that the y-value (in this case we call the predictions y that usually is used for the expected output)
The output is already converted to 0 and 1.

Prediction: {[“Not similar”,“Similar”][y[2][i].item()]}

How do I change the code to view the float-value of the probability? Where is this conversion between one-hot and the label done? Usually we use the argmax function to do this conversion but it’s not in the show_results function.

And if I do this conversion, I might still need to convert it to a specific class to make the Interpretation class to work properly. Lets say I would like to calculate a confusion matrix (i’ts not possible in this case) but my point is that we might need different decoding functions depending on which method that is using the predictions. I can’t find anything in the documentation about this.

def show_results(x:SiameseImage, y, samples, outs, ctxs=None, max_n=6, nrows=None, ncols=2, figsize=None, **kwargs):
if figsize is None: figsize = (ncols*6, max_n//ncols * 3)
if ctxs is None: ctxs = get_grid(min(x[0].shape[0], max_n), nrows=None, ncols=ncols, figsize=figsize)
for i,ctx in enumerate(ctxs):
title = f’Actual: {[“Not similar”,“Similar”][x[2][i].item()]} \n Prediction: {[“Not similar”,“Similar”][y[2][i].item()]}’
SiameseImage(x[0][i], x[1][i], title).show(ctx=ctx)

  1. Yes, I need to get my hands dirty. The problem is that I was expecting a well documented library that is easy to use and digg into the code since It has a lot of documentation. The reality is that I find code that seems optimized to specific examples with a few lines of code and not user-friendly. The variables are not documented in the code, few type annotations are used and the code style is something that I have not experienced before. eg, the “show_result” functions have the same name, are located at different places (learner, and Interpretation class) and takes different parameters. This consumes a lot of my time to understand things that in my mind should be simple. And I’m not able to help improve this library since the bar is too high to fully understand the code to know that i don’t break anything or understand why it breaks.


I failed to mention this in the first reply, but in the original learn.show_results the learn.get_preds(..., with_decoded=True) is used to get decoded predictions, and this is done through the loss function CrossEntropyLossFlat.decodes method.

That’s why in the function you provided y is already decoded.

1 Like