seems like both weight decay and learning rate can be used to address overfitting. How do we use them together ? And how do we find the optimal values of both when used in the same equation.

The learning rate doesn’t do any regularization, it’s the size of your updates.

They’re separate hyperparameters so you can specify them together. I don’t know how to find the optimal value for weight decay but learning_rate you can use lr_finder()

is LR only to decrease the number of epochs to achieve optimal results ?

Try re-running the learning rate finder with different wd parameters (see code). For example, try anywhere between [1e-5, 1e-4, 1e-3, 1e-2, 1e-1].

No. If you don’t have a proper value, you won’t train at all. Look again at the past lessons to refresh your mind, this has been covered in chapter 4 and 5.

How do we set the weight decay’s hyper parameter? Can we do something like a random search or grid search approach or is there a better way to set it?


There is no proper way we have found yet. So trying various values is still the best solution.


What’s the advantage of creating our own embedding layer over the stock PyTorch one? I think I missed that

What’s the difference between PyTorch’s nn.Module and fastai2’s Module?


fastai’s Module removes the need to call super().__init__(), which you need to call at each nn.Module init.


how does sample size of the ratings affect our learned bias ranking?

did we cover how n_factors is selected?

