How do you all approach picking the optimal learning rates to fine-tune your models? I am under the impression I should be picking a range that has the deepest declining slope.
In my last few cells, I thought a range of 1e-4, 1e-2 or 1e-3, 1e-1 were of the steepest slopes but my results aren’t as good as 1e-6, 1e-4 (which looks flat in the curve to me).
Appreciate any thoughts
So in general yes you want to pick learning rate that has a deepest declining slope. When you unfreeze layers of a neural net and plot
lr it becomes a bit harder to figure out what’s the steepest slope. If you pick for your second param in a
slice to be something that’s 5-10 times smaller then the original learning and for your first number you find a spot on the graph before things go up and you pick roughly 5-10 times less then that you should be ok-ish. (for reference
Now as Jeremy said picking the best learning rate is more of an art than science and the more you do it the more you can kinda guess what a best
lr is. Also Jeremy explains more about learning rates and how they work later in course so that should help you as well.
Hopefully this makes sense!