They’re separate hyperparameters so you can specify them together. I don’t know how to find the optimal value for weight decay but learning_rate you can use lr_finder()
is LR
only to decrease the number of epochs to achieve optimal results ?
Try re-running the learning rate finder with different wd
parameters (see code). For example, try anywhere between [1e-5, 1e-4, 1e-3, 1e-2, 1e-1]
.
No. If you don’t have a proper value, you won’t train at all. Look again at the past lessons to refresh your mind, this has been covered in chapter 4 and 5.
How do we set the weight decay’s hyper parameter? Can we do something like a random search or grid search approach or is there a better way to set it?
There is no proper way we have found yet. So trying various values is still the best solution.
What’s the advantage of creating our own embedding layer over the stock PyTorch one? I think I missed that
What’s the difference between PyTorch’s nn.Module
and fastai2’s Module
?
fastai’s Module
removes the need to call super().__init__()
, which you need to call at each nn.Module
init.
This is wrong Shawshank redemption should be on top!
It’s ALMOST as good as Lawnmower Man 2
Ah, E.T. My favorite romance film
how does sample size of the ratings affect our learned bias ranking?
did we cover how n_factors
is selected?
It’s another hyper-parameter you have to pick, so you can try a few values and see what works best.
any advice on how to select the best wd
(the weight decay hyper-parameter)?
More data is better! If you have no data look up solutions on the cold start problem
What motivates learning at 50 dimensional embedding and then using PCA to reduce to 3, versus learning a 3 dimensional embedding?