Live coding 12

This topic is for discussion of the 12th live coding session.

<<< session 11 session 13 >>>

Links from the walk-thru

What was covered

  • Fine-tuning vision models using small (pets) and big (planet) datasets
  • How to inspect a model’s architecture - learn.summary()
  • How to customize the model’s last layer for image classification and categorical prediction (work-in-progress)
  • Jeremy approach on choosing a vision model for productionization & what factors to consider
  • Jeremy workflow tricks 1) Copying notebooks when trying out a new idea 2) when adding a new functionality (highlighted by @radek here)
  • (please contribute here)

Video timeline by @daniel and @mattr

00:00 - Review of best vision models for fine tuning
10:50 - Learn export file pth format
12:30 - Multi-head deep learning model setup
16:00 - Getting a sense of the error rate
20:00 - Looking inside the model
22:30 - Shape of the model
23:40 - Last layer at the end of the head
26:00 - Changing the last layer
29:00 - Creating a DiseaseAndTypeClassifier subclass
38:00 - Debugging the plumbing of the new subclass
46:00 - Testing the new learner
49:00 - Create a new loss function
52:00 - Getting predictions for two different targets
56:00 - Create new error function

Session Review Questionnaire by @Daniel


I’m really digging these sessions. The last portion of it sort of went over my head, but I think after I watch it a second time, I’ll (hopefully) get it. It kind of gives me (a total beginner to fastai/python) a glimpse of the power that is elegantly hidden in the fastai libraries. Thank you so much for taking the time to share this knowledge with us!


I was kinda figuring out what to do as I did it, so not at all your fault that you’re not following my convoluted path! :wink:


Hello, the code is here, it may be handy while re-watching. Walkthru-12. Click on ’ View On Github’ at the top of the page.


I’m confused about what’s happening in the DiseaseAndTypeClassifier . I mean I know we’re changing the head to output two predictions ( ricetype and disease) but I don’t understand how the model knows that we’re changing the head?

Are self.l1 and self.l2 “special names/symbols” from some other python class/object that refer to the layers in the head part of the model? because in init we set these and then delete the last layer but I don’t see the newly created layers (“the new head”) being somehow attached to the body of the new model at the head position.

Also, I don’t understand what the ‘x’ is in forward(self,x) ?? who/what calls the forward method and what is the x that’s being passed into it?

class DiseaseAndTypeClassifier(nn.Module):
    def __init__(self,m):
        self.l1 = nn.Linear(in_features=512, out_features=10, bias=False) #rice type
        self.l2 = nn.Linear(in_features=512, out_features=10, bias=False) #disease
        del(m[1][-1]) #it removes the last layer
        self.m = m
    def forward (self,x):
        x = self.m(x)
        x1 = self.l1(x)
        x2 = self.l2(x)
        return x1,x2

As far as I understand:

Are self.l1 and self.l2 “special names/symbols” from some other python class/object that refer to the layers in the head part of the model? because in init we set these and then delete the last layer but I don’t see the newly created layers (“the new head”) being somehow attached to the body of the new model at the head position

  • x is input
  • the model updated in the __init__ (last layer removed).
  • and l1 and l2 been defined but not attached yet.
  • in forward:
  • first, we pass the input through m,
    x = self.m(x), except the last layer.
  • then we attach two last layers: for rice and for disease and pass x from them separately.
    x1 = self.l1(x) and
    x2 = self.l2(x)
  • then return both of them.
    return x1,x2

we redefine how the PyTorch passes input here:

    def forward (self,x):
        x = self.m(x)
        x1 = self.l1(x)
        x2 = self.l2(x)
        return x1,x2

As far as I know, forward is a special function that is defined in nn.Module.


the points @nikem makes are correct – just adding some additional information

This type of model construction can be confusing (or at least it was for me before) – we do not need to explicitly notify the original “model” that there is a new head being used and therefore we do not need to go about attaching it.

The DiseaseAndTypeClassifier class is what can be considered our model. The new “head” layers that we want to use can be utilized by simply initializing them when we instantiate the model. So that when we call our forward pass – Pytorch keeps track of which modules were used and is able to appropriately backpropogate this information through all the applicable (used) layers. The first few paragraphs of the Autograd doc/notes do a good job of explaining this: Autograd mechanics — PyTorch 1.11.0 documentation

We are able to use this type of constructor because everything is a Module class: Module — PyTorch 1.11.0 documentation the main points are pasted here: “Base class for all neural network modules.Your models should also subclass this class. Modules can also contain other Modules, allowing to nest them in a tree structure” – So our original model was a module which was itself built from Module objects. This allows us to swap out, replace or add on top of an existing module object.

In regards to the forward method – this is something that is required for modules – we define what the desired behavior is with our input data and provide what will be returned. Module — PyTorch 1.11.0 documentation as mentioned in the docs this needs to be overriden by all subclasses.

Since it is a Callable, it allows the model to be called a function, without needing to explicitly use “forward”: python - What is a "callable"? - Stack Overflow

Hope this helps!


The DiseaseAndTypeClassifier class is what can be considered our model.

Yes, that is right.

Thanks @nikem and @ali_baba , I understand that we’re copying the model and then messing around with the last layers (aka head). I also see the model being passed into DiseaseAndTypeClassifier. I don’t see anything being passed into forward() though and I don’t see forward being called anywhere in Jeremy’s code.

So, if x is “input” to forward

  • what is it input of? is this the input to head ?

  • Is the function forward() where we’re attaching the two layers we just created in init() to the body of the model we passed into the instantiation?

  • Why are we passing x to self.l1 and self.l2 and what does that do?

  • what does self.m(x) resolve to? because we’re assigning it back to x ( x = self.m(x) )

what I’m having trouble with is that I can’t see where control passes to forward() … maybe that’s what pytorch is doing?

Sorry for these questions as they’re probably quite elementary but I don’t quite understand what’s happening here.


I just would like to reflect on two wonderful things I saw Jeremy do :slight_smile:

Copying notebooks when trying out a new idea

This is huge to the workflow! I have often suffered in the past because I would just redo the cell with new logic, and then find out I am having a hard time going back to the previous, working state (or state where I would get a better result!).

I sometimes would be rescued by committing often, but this gets messy and is not the real solution. The real solution is to mash that copy button :slightly_smiling_face: Which is what I have now started doing and I absolutely love it.

How to add functionality

When Jeremy started to implement something new (multi-task loss) and it was not working initially, he traced back to the simplest working state with a hint of the new functionality.

This was sooo good. This is precisely the formula for getting your ML code to run and do something interesting :slightly_smiling_face:

Just a reminder to myself on this :slightly_smiling_face: We often get lost in the heat of the action, but there is a methodical way of working on your code that makes a lot of sense to reach out for :slight_smile:

And I guess it sounds simple when watching Jeremy, but it is a little bit like watching a maestro play the violin. Yeah, simple. But take a violin in your hands and things don’t go that well from start :slight_smile:

It’s a gradual process! :smile:


Yup that’s the key, when you call a nn.Module, it actually ends up calling forward in that class.

1 Like

Walkthru 12 a detailed note in the form of questions

00:40 How does Jeremy evaluate models on PETS dataset?

What’s interesting about the top 15 models? They got bits of everything (all different sorts of approaches)

01:53 What’s interesting of vit models for PETS dataset?

with large vit models for large images, they probably can have lower error rate.

the best performed vit models are the early ones, suggesting vit has not improved much since it came out

03:38 How does Jeremy evaluate modesl on planets dataset?

04:54 What does Jeremy mean by large model or models using large images?

06:03 How to use small and large pre-trained models when you do Kaggle or production projects?

09:16 Run Jeremy’s paddy notebooks and compare with yourself to find errors or differences

#question Has Jeremy shared his paddy notebooks into his paddy repo yet?

10:45 What is the difference between pth and pkl when saving model?

the format of model is in pkl, the variant extension used by pytorch is pth;

just make sure to save models in pth extension


12:33 If the disease-classification model can also learn to classify paddy varieties, then it may predict diseases even better. Why is it?

15:16 Why building the above model together would be a good way to test how well we understand the mechanisms of deep learning?

16:14 How does Jeremy test for the natural variation of the model’s performance?

How does Jeremy know the model’s performance is consistent or not?

What does it mean when the model’s performance (error-rate) jumps a lot?

18:34 Why it is intuitive/counter-intuitive that everything you experiment working on small models will magnify themselves in large models?

In other words, why should we exhaust experimenting with small models first and then move on to large models?

20:04 What consist of a model?

What does the body of a model look like and what does it do?

What does the head of a model look like and what does it do

How to open up the inner details of a model

21:46 How to show the shapes of input and output flow and callbacks at each layer for the entire model?

23:48 How to extract the head of the model and then the last layer of the head?

24:07 How to see the parameters of the last layer?

How to show the content of the parameters when ll.parameters() is too lazy to show the content?

What the shape of the last layer?

25:51 How to use one model to predict both 10 diseases and 10 varieties of paddy?

Will we replace the original last layer (10x512) of the head with a different last layer (20x512)? NO

Will we build two linear layers instead of one onto the head?

27:51 How to remove the last layer from the head?

28:38 How to create a DiseaseAndTypeClassifier class to build two linear layers for the head?

How to create the __init__ function of DiseaseAndTypeClassifier class

32:20 How to create the forward function of DiseaseAndTypeClassifier class and what does this forward do?

34:25 What amazed Radek from this line of code dtc = DiseaseAndTypeClassifier(m) class? we now actually created a new model which trains two separate linear layers (one for predicting varieties, the other for diseases) at the same time

Don’t forget the boilerplate of creating a Sequential layer using pytorch? super().__init__()

36:44 How to duplicate/copy the existing learner to make a new learner?

How to add our new model onto the newly duplicated/copied learner?

37:11 How Jeremy sort out a half-precision error when making a prediction with the new model

When things mess up without a clue, try to do restart the kernel first

#question When copy learner, maybe it is safer to use deepcopy?

42:19 How to get to the point that a copied learner (without adding the new model) can do get_preds without error?

#question Did Jeremy at this point assume that after the copied learner added our new model can still do get_preds without error? I think he assumed it at this moment since he didn’t run the code for it

43:46 How to create a DummyClassifier instead of using our DiseaseAndTypeClassifier and add it to the copied learner to see whether the new learner can still do get_preds?

This is a great demo on how to solve the problem by taking a step at a time! Very thorough and a very solid-science work style!

44:44 How to create DiseaseAndTypeClassifier to nicely build (get codes tidy and clean) the new model with two separate layers for disease and variety prediction and add this model to the copied learner and check whether get_preds works or not?

Now the error tells us we need to have loss function, assuming no other problems with running get_preds

47:10 Why we should expect an error about loss or loss func?

reminder: what is a loss function or loss?

Why we are having this tuple of tensors/losses error? because original design of loss func only return a single tensor for loss

47:53 How does Jeremy find out where the loss func of a Learner come from? or from where a loss func is get defined? so that we can design our own loss func and put it there

How to find out which loss func a model is using? use learn.loss_func

49:15 How to design the loss func for our model which output two groups of predictions?

What is the content of preds of our new model? (rice_preds, dis_preds) no longer a single 2D tensor disease_preds

How to design a new loss func (a dummy loss_func) which receives preds and targs from our new learner but only apply current loss func to only dis_preds not rice_preds?

53:54 How can we usually build a learner without specify the appropriate loss func?

Why or how does training set help specify the loss func for us?

When will we need to specify a particular loss func to replace the default loss func?

56:01 How to build a new metric (a dummy metric_func) using preds and targs from the new learner and only apply current error-rate metric to dis_preds

57:07 Let’s run lr_find() and fine_tune() with our new learner and we should expect it behaves the same as our previous learner solely on paddy disease

58:00 Not yet found previous fastai multi-loss func and in this case above we use lr_find() to pick a learning-rate more appropriate than the conservative suggestion given by lr_find


Did anyone notice learn.get_preds with multi-staged model doesn’t yield the class indexes. Check the outcome below:
Screenshot from 2022-06-28 01-32-28

In contrary to what a normal model would yield:
Screenshot from 2022-06-28 01-28-01

Is this is how it should be? It would be easier if get_preds would have returned the class indexes. Do we have to apply argmax ourselves on the output of multi-stage model?

1 Like

When we use our own loss functions etc we need to handle more stuff ourselves. There isn’t a general way that fastai can know what your custom functions are doing.


Thanks Jeremy. It makes sense as there might be numerous possibilities to do different things with the output based on the use case.

The idxs vector isn’t probabilities as few values are minus, I applied softmax() followed by argmax():


Hopefully this is the right approach. The cool thing with walkthroughs and course is that it is allowing to learn much finer details of library specifically and DL in general to create my own things. Thanks.

1 Like

Just a correction: probs.argmax(dim=1) is all we need since learn.get_preds() also returns probabilities.

1 Like

I’m finally catching up with all of these videos and ran into something strange during this one. Nobody else having a similar problem makes me nervous that I am missing something super obvious, but I thought I’d post it here in case someone runs into a similar issue.

I tried restarting my server when Jeremy did, but the problem persists and there is nothing useful on any forums. Any ideas?


1 Like

I ran into this problem as well.
I found a hacky way to make it work.
Just comment out batch_tfms or get rid of it like this from the dataloaders:

dls = ImageDataLoaders.from_folder(
    # batch_tfms=aug_transforms(size=128, min_scale=0.75)

And it works!

1 Like

At 18:35 there is a discussion about how things that work well on small models tend to work well also on larger models (e.g. of the same family). Are language models exception to that? It seems like there are these emergent capabilities that tend to happen only on huge billion-parameters models…