Live coding 12

Daniel · June 25, 2022, 3:07am

Walkthru 12 a detailed note in the form of questions

00:40 How does Jeremy evaluate models on PETS dataset?

What’s interesting about the top 15 models? They got bits of everything (all different sorts of approaches)

01:53 What’s interesting of vit models for PETS dataset?

with large vit models for large images, they probably can have lower error rate.

the best performed vit models are the early ones, suggesting vit has not improved much since it came out

03:38 How does Jeremy evaluate modesl on planets dataset?

04:54 What does Jeremy mean by large model or models using large images?

06:03 How to use small and large pre-trained models when you do Kaggle or production projects?

09:16 Run Jeremy’s paddy notebooks and compare with yourself to find errors or differences

#question Has Jeremy shared his paddy notebooks into his paddy repo yet?

10:45 What is the difference between pth and pkl when saving model?

the format of model is in pkl, the variant extension used by pytorch is pth;

just make sure to save models in pth extension

11:56

12:33 If the disease-classification model can also learn to classify paddy varieties, then it may predict diseases even better. Why is it?

15:16 Why building the above model together would be a good way to test how well we understand the mechanisms of deep learning?

16:14 How does Jeremy test for the natural variation of the model’s performance?

How does Jeremy know the model’s performance is consistent or not?

What does it mean when the model’s performance (error-rate) jumps a lot?

18:34 Why it is intuitive/counter-intuitive that everything you experiment working on small models will magnify themselves in large models?

In other words, why should we exhaust experimenting with small models first and then move on to large models?

20:04 What consist of a model?

What does the body of a model look like and what does it do?

What does the head of a model look like and what does it do

How to open up the inner details of a model

21:46 How to show the shapes of input and output flow and callbacks at each layer for the entire model?

23:48 How to extract the head of the model and then the last layer of the head?

24:07 How to see the parameters of the last layer?

How to show the content of the parameters when `ll.parameters()` is too lazy to show the content?

What the shape of the last layer?

25:51 How to use one model to predict both 10 diseases and 10 varieties of paddy?

Will we replace the original last layer (10x512) of the head with a different last layer (20x512)? NO

Will we build two linear layers instead of one onto the head?

27:51 How to remove the last layer from the head?

28:38 How to create a `DiseaseAndTypeClassifier` class to build two linear layers for the head?

How to create the `init` function of `DiseaseAndTypeClassifier` class

32:20 How to create the `forward` function of `DiseaseAndTypeClassifier` class and what does this `forward` do?

34:25 What amazed Radek from this line of code `dtc = DiseaseAndTypeClassifier(m)` class? we now actually created a new model which trains two separate linear layers (one for predicting varieties, the other for diseases) at the same time

Don’t forget the boilerplate of creating a Sequential layer using pytorch? `super().init()`

36:44 How to duplicate/copy the existing learner to make a new learner?

How to add our new model onto the newly duplicated/copied learner?

37:11 How Jeremy sort out a `half-precision` error when making a prediction with the new model

When things mess up without a clue, try to do restart the kernel first

#question When copy learner, maybe it is safer to use deepcopy?

42:19 How to get to the point that a copied learner (without adding the new model) can do `get_preds` without error?

#question Did Jeremy at this point assume that after the copied learner added our new model can still do get_preds without error? I think he assumed it at this moment since he didn’t run the code for it

43:46 How to create a `DummyClassifier` instead of using our `DiseaseAndTypeClassifier` and add it to the copied learner to see whether the new learner can still do `get_preds`?

This is a great demo on how to solve the problem by taking a step at a time! Very thorough and a very solid-science work style!

44:44 How to create `DiseaseAndTypeClassifier` to nicely build (get codes tidy and clean) the new model with two separate layers for disease and variety prediction and add this model to the copied learner and check whether `get_preds` works or not?

Now the error tells us we need to have loss function, assuming no other problems with running get_preds

47:10 Why we should expect an error about loss or loss func?

reminder: what is a loss function or loss?

Why we are having this tuple of tensors/losses error? because original design of loss func only return a single tensor for loss

47:53 How does Jeremy find out where the loss func of a `Learner` come from? or from where a loss func is get defined? so that we can design our own loss func and put it there

How to find out which loss func a model is using? use `learn.loss_func`

49:15 How to design the loss func for our model which output two groups of predictions?

What is the content of `preds` of our new model? `(rice_preds, dis_preds)` no longer a single 2D tensor `disease_preds`

How to design a new loss func (a dummy loss_func) which receives `preds` and `targs` from our new learner but only apply current loss func to only `dis_preds` not `rice_preds`?

53:54 How can we usually build a learner without specify the appropriate loss func?

Why or how does training set help specify the loss func for us?

When will we need to specify a particular loss func to replace the default loss func?

56:01 How to build a new metric (a dummy metric_func) using `preds` and `targs` from the new learner and only apply current error-rate metric to `dis_preds`

57:07 Let’s run `lr_find()` and `fine_tune()` with our new learner and we should expect it behaves the same as our previous learner solely on paddy disease

58:00 Not yet found previous fastai multi-loss func and in this case above we use `lr_find()` to pick a learning-rate more appropriate than the conservative suggestion given by `lr_find`