Wiki: Lesson 2

(Ian) #42

Thank you! I am slightly embarrassed that I did not come across this earlier. Will search better next time!


(David Bressler) #43

#1 is definitely misleading. Turning data augmentation on in step 1 has no effect if precompute=False.

(Shivan) #44

Hey guys ! So I just finished with Lesson 2 and I have a few doubts.

1: What are the precomputed activations Jeremy talks about in the lesson ? For example, some activations are activated when there are eye balls in the picture, some are activated when there are dogs and so on. I just want to understand them fundamentally.

2: I think this is related but what do you mean by freezing and unfreezing layers ?

Homework for lesson 2 (v2)
(Ian) #45

For anyone else with the same issue, this was my new code that worked for me:

log_preds,y = learn.TTA()
probs = np.mean(np.exp(log_preds), axis=0)
accuracy(probs,y), metrics.log_loss(y, probs)



learn.TTA() returns a 3-d dataset for log_preds
Lesson1.ipynb - Error - TypeError: torch.max received an invalid combination of arguments - got (numpy.ndarray, dim=int)
(Shivan) #46

Can you point out the exact repo ? I’ve looked into the repos on the internet but cant seem to find it. Thanks !

(Vijay Narayanan Parakimeethal) #47

I was redoing the Lesson 2 and currently facing a challenge at the learn.sched.plot. My learn.lr_find() works well but when I plot it I am not able to infer anything from it. What should I be doing to make the learning rate visible for me to infer?



Hi @GregFet, your response does answer my question :slight_smile:
Since you mentioned “like … new images”, I would consider this tip is kind of “data augmentation”.
Many thanks.

(Kouassi Konan Jean-Claude) #49

Hi everybody!
I do not find the tmp_lesson1-breeds.ipynb notebook from the repo.
Could someone provide the link, please?

(Jeremy Howard (Admin)) #50

You’re meant to create the lesson breeds notebook yourself :slight_smile:

(Kouassi Konan Jean-Claude) #51

OK @jeremy,
Indeed, as all parts of implementation are already in the lecture of lesson 2, I thought that the notebook had been made available.
So, I will rewrite them from the lecture, thank you.

(Mac Yeh) #52

Hi Jeremy,
Any way to get the images for dogbreeds competition? Git or download from Kaggle directly? It would be good if there’s a instruction on getting the competition images?

Best & Thanks,

(Mac Yeh) #53

No worries; i saw the video on Lesson 3. Thanks

(Matthew Kleinsmith) #54

You can call magic commands in external modules and then import them:

from IPython import get_ipython

get_ipython().magic(u"%matplotlib inline")
get_ipython().magic(u"%reload_ext autoreload")
get_ipython().magic(u"%autoreload 2")

If you put the above in a file named “”, you can call them via “import utils”, “from utils import *”, or “from utils import some_function”.

(Pavel Surmenok) #55

Look at values in learn.sched.lrs and learn.sched.losses array. Maybe they are out of bounds of your plot?

(Pavel Surmenok) #56
  1. You can think about the model as if it consists of two parts:
    a) a few convolutional layers that get raw pixels as input and produce a vector of size 1024.
    b) 2 dense layers. The first has 1024 features as input, 512 features as output, the second gets 512 features as input and produces 2 as output. These 2 outputs (after applying softmax) are probabilities for cat vs dog.

If you initialize the learner with precompute=True parameter, the learner is doing a smart computational optimization. It evaluates the 1st part of the model (convolutional layers) for every image in your dataset. As a result, it computes a vector of 1024 numbers for each image. This is what is called “precomputed activations”.
Then when you train the model on your dogs’n’cats dataset, the learner doesn’t do all calculations for the convolutional layers again. It just gets the precomputed activations for every image, and trains only the 2nd part of the model - two dense layers. It speeds up the training a lot because two dense layers are a very small part of the entire model, it doesn’t take much time to execute.
Of course, precomputing activations will help only if you don’t want to retrain convolutional layers.

  1. The general idea of freezing lower layers is that you want to preserve information that was gained when the original model was trained on the large dataset. Unfreezing all layers would likely lead to forgetting of some important low level filters.
    Above I described that precompute helps to optimize training when you want to train the last two dense layers and “freeze” all the convolutional layers.
    You can decide to do something else, e.g. freeze only the first few convolutional layers and train the last few convolutional layers and dense layers.
    When you train the model, the forward pass goes through all the layers. But when you calculate an error and do backpropagation, you update only weights of layers that are “unfrozen” and don’t change weights in “frozen” layers.
    Using fastai library, you have a fine-grained control on which layers are “frozen” (untrainable) and which are “unfrozen” (trainable).

(Shivan) #57

Watching Lesson 3 combined with your answer has improved my understanding extensively. Thank you for the detailed post !

(Vijay Narayanan Parakimeethal) #58

Thanks. Will definitely take a look at that.


[some edits for clarification]

The following question has probably been answered somewhere here, but I haven’t yet seen a complete response.

With so many other reliable metrics, why do we use accuracy as our primary or initial measure of model goodness? In the first lesson, we do look at the confusion matrix. I’ve also seen AUC (area under the ROC curve) used as a more robust measure of the goodness of a model and it’s simple enough to compute.

So why not refine our network until the AUC stops getting better, instead of the accuracy? Am I overlooking something else in the network refinement strategy that makes the AUC or other metrics actually not as critical a measure for deep nets as I’ve assumed?


(Jeremy Howard (Admin)) #60

I think AUC is a fine metric to use. It’s often used for Kaggle competitions, in fact.

Generally however accuracy is a little more intuitive to interpret, and being intuitive is important for understanding how our model is training. Also, accuracy is often the measure that’s closest to what matters in practice for the final model. AUC and accuracy are closely related, of course. Using both is a good idea!

(Pranav Kanade) #61

In lesson 1 @jeremy uses differential learning rates as


In lesson 2 he uses following

lr = 0.2
lrs = np.array([lr/9,lr/3,lr])

He says that, he used /3 and /9 so that he could train the middle layers more as the dataset is different than image net.
My question: what is the relation between higher or lower learning rates to training more or training less?
Did I miss something in video??