Lesson 2 In-Class Discussion ✅

KarlH · October 31, 2018, 3:11am

The model isn’t evenly trained. There’s the resnet backbone, which has been extensively trained on all of Imagenet, and the head, which we add on for our classification purpose and is entirely untrained.

If you trained the entire model at once, you could get large errors coming from the untrained layers back propagating through the model and messing up your nicely pretrained weights.

Training with the backbone frozen allows us to only trained the untrained layers in the head. Once those layers have converged somewhat, we unfreeze the entire model and continue training.

Mauro · October 31, 2018, 3:12am

Its OK. I just saw what you meant. It was good question.

sandmann · October 31, 2018, 3:12am

“The cycle length of the total move up and down is accros the number of epochs.”
Thanks a lot - that’s important to understand!

simonw · October 31, 2018, 3:13am

When you first access models.resnet34 you get a progress bar while it downloads the model to disk, which is the thing that confused me.

Benudek · October 31, 2018, 3:13am

if you dont get that url download file after below, check the adblocker.

urls = Array.from(document.querySelectorAll(’.rg_di .rg_meta’)).map(el=>JSON.parse(el.textContent).ou);
window.open(‘data:text/csv;charset=utf-8,’ + escape(urls.join(’\n’)));

sgugger · October 31, 2018, 3:13am

That’s the pretrained models. Here during inference, you’ll load your own model.

gamino · October 31, 2018, 3:14am

learn.be_patient()

evan.xiong · October 31, 2018, 3:14am

I am interested, what is the right practice in fastiai, to load a set videos and sample e.g. every 3 second, to from a bunch series of images. So that I can apply Resnet34 (time-distributed) on each image and apply an LSTM on the next layer, for building a video classifier?

vedder · October 31, 2018, 3:14am

For unbalanced data: What do you do if the class you care most about is a rare class. An example is identifying skin lesions where the most common benign class is way more frequent than melanoma?

suvash · October 31, 2018, 3:14am

This might be a slightly advanced thing, better to ask there, also there’s some discussion spread around here.

dotkay · October 31, 2018, 3:17am

I think ideally, we would have to balance the main data set we started with, shuffle the balanced data set and then get some validation/test sets out of the shuffled data set (instead of trying to balance only the train or validation set separately). That’s what I think intuitively - some expert can correct me if I am wrong.

sandeepsign · October 31, 2018, 3:18am

Other than viability of convergence, Why does learning rate affect “accuracy”? It is just the rate of learning! it should just affect the time to train?

simonw · October 31, 2018, 3:18am

Anyone understand what this is doing?

x = torch.ones(n,2) 
x[:,0].uniform_(-1.,1)

jpramos · October 31, 2018, 3:20am

I guess if it’s too big, it will overshoot and completely miss the optima, thus affecting ‘accuracy’.

sgugger · October 31, 2018, 3:20am

You have the whole week to figure it out
pytorch doc is your best friend for this.

scier · October 31, 2018, 3:20am

Dot product is not matrix multiplication. The former is defined for vectors and is a completely different operation from the matrix multiplication. Dot product can be represented as matrix multiplication though.

dotkay · October 31, 2018, 3:20am

torch.ones(100,2) creates a 100x2 matrix and x[:,0].uniform(-1.,1) makes the values follow an uniform distribution between -1 and +1

sandeepsign · October 31, 2018, 3:21am

yeah, thats what i meant by saying 2 different learning rates which will find the minima, which one to choose?

ste · October 31, 2018, 3:21am

Creating matrix [n,2] of ones and replacing the first dimension with random numbers [-1,1]

sgugger · October 31, 2018, 3:22am

A matrix multiplication is a lot of dot products, it’s not a completely different operation.