Wiki: Lesson 2

(Jeremy Howard (Admin)) #62

My thinking here is that a higher learning rate will push the weights around more - and therefore will give less credence to the pretrained weights. So in lesson 2 the earlier layers have higher relative weights, which means we’re pushing the weights further from the pretrained network.

Having said that, I haven’t done rigorous experiments, and I haven’t seen this technique used elsewhere, so take my theory with a grain of salt. @sebastianruder is helping run some experiments at the moment that may give some initial data on this, but it might be a few months before we have anything really concrete to show.

(Kashish Jain) #63

Hello, I am a student with a pretty basic laptop with 2gb 940MX. I am currently stuck at the part where we unfreeze all the layers of the network and train them and it is showing an error of ‘OUT OF MEMORY’. I don’t have access to any cloud GPU’s either. I just have one solution to go ahead if i could get the file ‘224_all.h5’ that is located under models folder under the dogs-vs-cats dataset folder from anyone who has trained the model. Really appreciated if anyone could provide me with the file. Thank you.

(Jeremy Howard (Admin)) #64

Try using a much smaller batch size, and a small architecture (eg resnet18).

(Kashish Jain) #65

Thank you sir for replying, I tried that but all in vain it still gives error of out of memory. Only solution I can see here now is if anyone could provide me with the weights of the fully trained resnet34. Thank you.


In the loss vs iterations plot generated by learn.sched.plot_loss(), the loss is computed from a mini-batch or is it for the entire training dataset?
After every iteration, is the loss of entire training set computed for this plot? Or is the loss computed from the mini-batch that was used for the iteration?



I tried to re-create the Dog-breed competition shown in the Lesson 2 video, but When I tried to use the TTA method, I receive an error:

Does anybody know what’s the meaning and how to solve it?

I am not familiar with English too much, maybe there are some post had discussed this problem before.But any reply should be helpful, Thanks a lot.


I think what you intended to do is:


Thanks for your correction,But I still got the same TypeError


Hi guys, I could not find the file/folder that contains label information for cats/dogs classification in the dataset. How does the model learn without labeling?

Edit: I found that all cat images were put in one folder while dogs in the other one, probably they were labeled in this way, which I am not quite familiar with.


Would you share your code for the kaggle dog breeds competition using fastai libraries? Because I found that code used label data while the code for dogscats did not. Thank you.

(Andrea de Luca) #73

I’m having troubles getting the correct versions of lesson2 video vs. lesson2 notebook.

The official video is about the dog breeds dataset, while the official notebook is about planet’s dataset.

Can you help me? Thanks.

I'm driving crazy

i went through lesson 2 and tried to model dog breed dataset andmy training loss seems to be not improving.

with size 224,

100% 5/5 [03:51<00:00, 46.29s/it]
epoch trn_loss val_loss accuracy
0 0.636873 0.518519 0.83724
1 0.616809 0.514048 0.843099
2 0.613019 0.500161 0.845052
3 0.585479 0.501381 0.843099
4 0.560301 0.493018 0.844564

I set the size to 299 and got the loss as

epoch trn_loss val_loss accuracy
0 0.572977 0.440465 0.860677
1 0.526915 0.4354 0.858691
2 0.512736 0.425571 0.86556

with differential learning rates [1e-4,1e-3,1e-2] , I tried again to fit the data and loss is

6 0.288757 0.440223 0.854297

I even tried with size as 350 but don’t see any improvement.

any suggestions would be helpful…



For the instructors, I wish there was a half-completed notebook with tasks/notebooks that I could fill out for this lesson. I am not very clear what to do after watching the lesson. Right now I am playing with the notebooks, but without much direction.

Edit: I started watching lesson 3 and I see there’s a task to classify dog breeds using the kaggle dataset, i will do that

(Bryan Lee) #76

This seems like a basic question, so please point me to the right place if I’ve missed it somewhere else :slightly_smiling_face:

We cover multi-labels in lesson three, but how do you handle multi-classes when one class is unknown? For example, for image classification what would be the set up for “apple”, “banana”, “anything that’s not an apple or a banana?”

(ecdrid) #77

Any deep learning library which I know of will only help us with those classes which we have trained them on…?

(Bryan Lee) #78

I see. From what I’m seeing on Stack Overflow, the solution is to create a “garbage” class that you fill with everything else. Seems like a lot of extra work for something that must be a common problem, but I’ll try it and see.

(ecdrid) #79

Actually we have to re-create the notebooks without looking it again on dataset’s of your own choice or use the existing one’s…

At-least this is what i am doing for ML ones…

(Henry Wong) #81

In lesson 1 @jeremy used the following differential learning rates with the preceding layers trained 10x slower than the following set of layers:


In lesson 2 in Understanding Amazon from Space:

lr = 0.2
lrs = np.array([lr/9,lr/3,lr])

While I understand the rationale behind having varying learning rates I can’t seem to understand why lr/9 and lr/3 are used in this case (9 and 3 seem quite arbitrary, why not lr/25 or lr/16 or any other numbers?)

Since we will not want to alter the weights of the earlier layers too much since they capture fundamental low-level patterns / gradients etc. in the images, why are we training it at much higher learning rate in the notebook used in lesson2 vs the dogs-cats problem in lesson1? Only reason I can think of is that the model has been trained on data similar to the data for dogs-and-cats and not on satellite images and therefore we might want to nudge the weights in the earlier layers a little more, but then again how do I know how large a learning rate should be set for the earlier layers?

Appreciate the help! :slight_smile:

(Alan O'Donnell) #82

@henrywjr I think you’re thinking about things the right way–I can’t remember where in lecture Jeremy covers this, but he says pretty much exactly what you describe.

Picking 3 and 3^2 is definitely a bit of a judgement call, but the thinking is just that we want to give the weights more wiggle room. We were using 10 and 10^2 before, and 3 < 10, so :man_shrugging: Ultimately this is a hyperparameter. You either try a bunch of different approaches and see how they do on your validation set, or you can just follow your heart.

(Sharath) #83


This is regarding Multi-label Classification (lesson2-image_models.ipynb):

x,y = next(iter(data.val_dl))

Looks like y is a 64 x 17 matrix, it would have made more sense for it to be a vector of length equal to number of classes, so, 1 x 17 or 17 x 1. Why is y a 64 x 17 matrix, what does it represent?


Edit: This was answered in Lesson 3, basically y is the transpose of mini-batch, in this case mini-batch size is 64.