Lesson 1 In-Class Discussion ✅

astronomy88 · October 30, 2018, 4:42am

In lesson 1 notebook, when we plot the confusion matrix:

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

Is it using a hold-out set to do it ? Is this set different from the set used to train the model ?

Thanks!

astronomy88 · October 30, 2018, 4:43am

Which platform are you using ?

jfl2 · October 30, 2018, 5:24am

Thanks! I kept looking for the method in the learn object - didn’t expect it to be a part of the img!

skottapa · October 30, 2018, 6:38am

For resnet, the typical normalization is mean subtraction; no division by std. The mean for ImageNet dataset is

mean= [103.939, 116.779, 123.68]

(BGR channels)

maral · October 30, 2018, 10:56am

@tsail good question. When we do one pass we learn which direction to adjust the weights in (up or down) based on the data we have seen and the labels we are trying to tune the network to recognise. The problem is we do not know by how much to adjust the weights. The learning rate represents the size of the adjustment to the weights (i.e. we multiply the weights by the learning rate). A small learning rate makes smaller adjustments to the weights and needs more iterations over the data (epochs) to get to an optimal point. The caveat is the learner can get trapped at various points but let’s not discuss that now as it could lead to confusion. A large learning rate will adjust the weights more aggressively. The next question is why dont we just use large learning rates? If we use a large learning rate we can overshoot the optimal point we are trying to narrow in on and because its large we end up bouncing back and forth without ever narrowing in on the optimal point. So it is common to start with a large learning rate and then gradually decrease it. However this is just one method. There are many methods (automatic and manual) to adjust the learning rate.

Descobar14 · October 30, 2018, 11:29am

Google colab

amallya · October 30, 2018, 12:50pm

Hi, Did you fix this error?

Bhuvana_ka · October 30, 2018, 12:53pm

Hi, No.

I am working on Colab now and am uploading the data to my google drive and working on it.

amallya · October 30, 2018, 1:01pm

Ah ok. AWS is my platform and I have my data in S3. I am getting the same error - KeyError: ‘content-length’.
Thanks.

Takezo · October 30, 2018, 2:03pm

I did not check, but I am pretty confident that this is pytorch.

tsail · October 30, 2018, 3:09pm

Thanks for your explanation @maral!

amallya · October 30, 2018, 3:48pm

I encounter the same issue with different kaggle dataset. Did you fix this?

akschougule · October 30, 2018, 4:24pm

Has anyone tried using mnist_stats declared in fastai/vision/ data.py

When I try data.normalize(mnist_stats) I get the error mnist_stats not defined. I proceeded with declaring that in my notebook but maybe data.py needs to be updated? (the __all__ part)

jeremy · October 30, 2018, 6:17pm

Will be in the next release.

gurvinder · October 30, 2018, 7:01pm

Wondering what is the relationship between learning rate and batch size. As in first lecture in the end, I needed to decrease batch size due to memory issue.

drscotthawley · October 30, 2018, 9:22pm

1cycle policy (beginner-level) questions:

In Lesson 1, Jeremy said we’d be using the 1cycle policy for scheduling learning rates. I wanted to know more about this method, so I read @sgugger’s excellent tutorial about this, but I got confused about two things:

Terminology: I know what an “epoch” is, but I found “iterations” used in the tutorial to be confusing, so I want to check: Is an iteration just one loss (and perhaps backpropagation) calculation for one mini-batch? ( Leslie Smith’s paper seemed to use the two words interchangeably in some places, and in other places the terms are clearly are not equivalent. )

This would seem to agree with the definition in this post:

“Iterations is the number of batches needed to complete one epoch”

So…just checking: is this right? Thanks.

My second question is regarding where Sylvian says…

“Then, the length of this cycle should be slightly less than the total number of epochs”

…but how do you know what the total number of epochs should be, until you actually do the training and monitor the validation loss (i.e. looking for when it starts to flatten out)? Or does he really mean “total number of iterations per epoch”?

(I’m aware the fastaiv1 automates this policy so that we can simply “use it”, but I hope to understand what it’s doing.)

balnazzar · October 30, 2018, 9:32pm

Since that issue of looking at the lowest loss vs strongest (neg) slope confuses many people, maybe the LR finder could plot the derivative of the LR vs. loss function. Then one could tell people just to select its minimum as the ideal learning rate. Some people are not comfortable with the notion of slope per se.

balnazzar · October 30, 2018, 9:50pm

In such case, I would pick something a bit less than 10^-5. Nonetheless, I don’t like that plot. Try with different BSs.

jeremy · October 31, 2018, 12:00am

Yeah I’ve tried that, but it didn’t quite work well. We still don’t have a hard and fast rule - if someone can come up with something that works exactly everytime, that would be great!

champs.jaideep · October 31, 2018, 1:12am

most confusng thign happen when you see a curve declining then goes flat with slight up n down for many iterations ,then coming down a bit not that low as before before finally rocketing…

so in that case should one chose the declining one or the rate at which curve was flat…