I’ve been looking for an answer to this and perhaps I’ve just missed it, but it seems like the learning rate is determine in an indirect way. Run it/plot it/find it. Is there no way to just have it return the optimal learning rate? Have I missed the function that returns that? Or is there a reason it needs to be done this way?
In the 2nd lecture Jeremie talks a bit why we want different learning rates for different stages of learning.
In the begging your weights are far from perfect, so if you set it too low learning will be slow and it won’t make much difference because the model would improve anyway and the further you get you want to make smaller steps so you won’t miss the ‘sweet spot’.
You can think of it like it was a drawing, at first you do some raw sketch, then you do some more detailed work and in the end you polish the details.
It would not make much sense to pay equal amount of time to every step.
I think you can set a constant learning rate and see what comes out and how it influences learning speed.
I think extracting one number from learning rate finder function might be good addition to that function. I think anyone is welcome to try and find optimal place from behavior of derivative.
I suppose that learning rate was selected mostly from experience, so after 10-20 times person got intuition to select good one. The intuition might be wrong, and having additional functionality to learning rate finder that suggests one number (or range) might be useful.
Please tell me if I got your question right.
Most likely you can’t tell 100% what is the best learning rate untill you train model with several different ones and see which one works faster
I’ve been thinking of this myself. The gradient at the sweet spot is probably not difficult to determine analytically.
That’s what I’m asking. As poppingtonic mentions, it seems like it would not be hard to derive the estimated beginning optimal learning rate from lr_find. I just wanted to make sure that wasn’t already there and I had just missed it.
As I remember from part1 of the course, learning rate finder graph looks really different for different models e.g. planets, cats, nlp. Thereafter human can generalize solution easier that strict rules.
Nevertheless I agree that it might be good to have function return single number for learning rate.
You can do it automatically, sure why not? It’s just a question of if the extra effort automating is worth it. In a lot of cases, this is just exploratory analysis, not big data with automated pipelines and databases. You are right in guessing it’s done automatically in the case of the latter.