Lesson 5 In-Class Discussion ✅

sgugger · November 20, 2018, 3:08am

Jeremy will give one later, but this is obviously a new hyperparameter to adjust.

lesscomfortable · November 20, 2018, 3:09am

The bias is just another weight for the model. It gets updated during gradient descent.

AlexisGallagher · November 20, 2018, 3:09am

Is bias here a bit like the average rating a user gives independent of the latent movie features, and the average rating a movie gets independent of the latent movie features

OCData_nerd · November 20, 2018, 3:10am

Here’s an illustrative latent factor visualization from my Excel model (inspired from Jeremy’s lesson - see Part 4 of this blog post):

sgugger · November 20, 2018, 3:10am

Yes a bit of that. It’s the harshness of the user and the value (in one sense) of the movie.

PierreO · November 20, 2018, 3:11am

So the cycle is over all the epochs you specified in fit_one_cycle right ?

deepanshu2017 · November 20, 2018, 3:11am

Why sometimes I am getting negative value of loss and negative value of validation while training? What is the intuition of negative loss?

sgugger · November 20, 2018, 3:11am

Exactly.

agoldina · November 20, 2018, 3:12am

This is a great blog post - I really recommend it!

mrandy · November 20, 2018, 3:12am

We embed a movie or a user using 4 numbers. Why not 10? Is that a hyper-parameter?

lesscomfortable · November 20, 2018, 3:12am

In general it is the ‘absolute level’ of that item relative to other items. In this case, your intuition is correct.

champs.jaideep · November 20, 2018, 3:12am

can some one explain div factor used in fit on e cycle

sgugger · November 20, 2018, 3:12am

Yes, the embedding size, also called in this instance, the number of latent factors.

krash · November 20, 2018, 3:13am

When we visualise kernels in CNN, what are we plotting? The weights of those kernels or activations?

gpakosz · November 20, 2018, 3:13am

A bit late, but why replace the later weight matrix of ResNet by 2 matrices with a ReLu in between, instead of just 1 matrix?

What does that bring?

bjcmit · November 20, 2018, 3:13am

Does an epoch train on all the whole training set or only a single batch?

shaun1 · November 20, 2018, 3:13am

Are binary variables worth being represented by embeddings?

fredguth · November 20, 2018, 3:14am

fit one cycle will change your learning rate according to the one cycle policy. This means that you will start with a smaller learning rate lr/div go to lr and then go back. The best way is to see that with plot. You will get instantly.

If you say div = 10, your starting learning rate is 1/10 the lr you set.

sgugger · November 20, 2018, 3:14am

This is not equivalent because ReLU is non-linear. The universal approximation theorem tells us we can model anything as long as we use affine transforms with non-linearity between them.

ram_cse · November 20, 2018, 3:14am

On whole training set.