Wiki: Lesson 2

I recreated the Jeremy`s notebook Dogbreeds and got in top16% of competition. So I think changing image size works perfect))
As I understood correctly, the images with bigger size (299) like are a new images for the model. And learns again without overfitting.
About size-changing Jeremy talk in detail in Lesson3 (short answer: model changes size of original images to 224 or 299 every time they are loaded into model). So if original size of your images is very large it better to previously change size.

3 Likes

Hey @GregFet

I managed to get up to the last step where we explored using log_preds and got the error shown below. Did you get something similar as well?

Thanks
Ian

Sure. Guys helped.

1 Like

Thank you! I am slightly embarrassed that I did not come across this earlier. Will search better next time!

:slight_smile:

#1 is definitely misleading. Turning data augmentation on in step 1 has no effect if precompute=False.

Hey guys ! So I just finished with Lesson 2 and I have a few doubts.

1: What are the precomputed activations Jeremy talks about in the lesson ? For example, some activations are activated when there are eye balls in the picture, some are activated when there are dogs and so on. I just want to understand them fundamentally.

2: I think this is related but what do you mean by freezing and unfreezing layers ?

2 Likes

For anyone else with the same issue, this was my new code that worked for me:

log_preds,y = learn.TTA()
probs = np.mean(np.exp(log_preds), axis=0)
accuracy(probs,y), metrics.log_loss(y, probs)

Cheers

Ian

5 Likes

Can you point out the exact repo ? I’ve looked into the repos on the internet but cant seem to find it. Thanks !

I was redoing the Lesson 2 and currently facing a challenge at the learn.sched.plot. My learn.lr_find() works well but when I plot it I am not able to infer anything from it. What should I be doing to make the learning rate visible for me to infer?

plot

Hi @GregFet, your response does answer my question :slight_smile:
Since you mentioned “like … new images”, I would consider this tip is kind of “data augmentation”.
Many thanks.

Hi everybody!
I do not find the tmp_lesson1-breeds.ipynb notebook from the repo.
Could someone provide the link, please?

1 Like

You’re meant to create the lesson breeds notebook yourself :slight_smile:

5 Likes

OK @jeremy,
Indeed, as all parts of implementation are already in the lecture of lesson 2, I thought that the notebook had been made available.
So, I will rewrite them from the lecture, thank you.

@jeremy
Hi Jeremy,
Any way to get the images for dogbreeds competition? Git or download from Kaggle directly? It would be good if there’s a instruction on getting the competition images?

Best & Thanks,
Mac

@jeremy
No worries; i saw the video on Lesson 3. Thanks

You can call magic commands in external modules and then import them:

from IPython import get_ipython

get_ipython().magic(u"%matplotlib inline")
get_ipython().magic(u"%reload_ext autoreload")
get_ipython().magic(u"%autoreload 2")

If you put the above in a file named “utils.py”, you can call them via “import utils”, “from utils import *”, or “from utils import some_function”.

3 Likes

Look at values in learn.sched.lrs and learn.sched.losses array. Maybe they are out of bounds of your plot?

1 Like
  1. You can think about the model as if it consists of two parts:
    a) a few convolutional layers that get raw pixels as input and produce a vector of size 1024.
    b) 2 dense layers. The first has 1024 features as input, 512 features as output, the second gets 512 features as input and produces 2 as output. These 2 outputs (after applying softmax) are probabilities for cat vs dog.

If you initialize the learner with precompute=True parameter, the learner is doing a smart computational optimization. It evaluates the 1st part of the model (convolutional layers) for every image in your dataset. As a result, it computes a vector of 1024 numbers for each image. This is what is called “precomputed activations”.
Then when you train the model on your dogs’n’cats dataset, the learner doesn’t do all calculations for the convolutional layers again. It just gets the precomputed activations for every image, and trains only the 2nd part of the model - two dense layers. It speeds up the training a lot because two dense layers are a very small part of the entire model, it doesn’t take much time to execute.
Of course, precomputing activations will help only if you don’t want to retrain convolutional layers.

  1. The general idea of freezing lower layers is that you want to preserve information that was gained when the original model was trained on the large dataset. Unfreezing all layers would likely lead to forgetting of some important low level filters.
    Above I described that precompute helps to optimize training when you want to train the last two dense layers and “freeze” all the convolutional layers.
    You can decide to do something else, e.g. freeze only the first few convolutional layers and train the last few convolutional layers and dense layers.
    When you train the model, the forward pass goes through all the layers. But when you calculate an error and do backpropagation, you update only weights of layers that are “unfrozen” and don’t change weights in “frozen” layers.
    Using fastai library, you have a fine-grained control on which layers are “frozen” (untrainable) and which are “unfrozen” (trainable).
21 Likes

Watching Lesson 3 combined with your answer has improved my understanding extensively. Thank you for the detailed post !

Thanks. Will definitely take a look at that.