I see that the model result is better using 5e-3 for learning rate than using the lr_find() method when doing collab filtering. Depending on the “n-factor” (embedded layers), the lr_find() method returns a fairly large number, e.g., 0.36.
Do you know why the magic 1-e3 learning rate is better than the lr_find() returned value, or is the lr_find() broken in collab filtering?
In addition, I tend to use the default “valley” value and not the “slide” or “steep” value. Is there any guideline on which values are more suitable than other values for a particular dataset?