Learning rate finder

For the graph above, we identofy our key values
Minimum/10: {lr_min:.2e}, steepest point: {lr_steep:.2e}"
to be 1.00e-02 and 5.25e-03.

Mr Jeremy however said that the new lr should be
base_lr=3e-3 how did he get this number ? how is this related to either 1.00e-02 or5.25e-03?

and could some explain this too ?

1 Like

Hi @shay1309,
There is not an exact way to deduce this value. In general, in the video, Jeremy mentions the steepest point or the min divided by 10. Generally, you want to take a point in the slope, not too close to the end to have time to learn. In that case, they are quite a few possibilities and 3e-3 is right in the middle of the slope so it will work. You can maybe tweak it a bit better but for illustrative purposes it is close enough.

Hope it helps!
Charles

Hi Charles you helped once again, I see what you mean now, I thought that 3e-3 was one of the 2 stated values above converted from log to linear form … silly mistake forgot that he just took the average of the 2 values …

1 Like

also Mr Charles, would you happen to know why we use

vs learn.fit_one_cycle(6, lr_max=5.5e-5)

In learn.fit_one_cycle(3, 3e-3) this was my first round of training just the last few layers ,
the round right after we create the learner …

For learn.fit_one_cycle(6, lr_max=5.5e-5) this was after i found the new learning rate is there a reason why we had t use lr_max … if i use lr_max then what was my initial lr and isnt lr supposed to be a constant that we multiply to the gradient … so why should I have a max i.e Why does is my lr a dynamic and not a constant value ?

Hi @shay1309,
learn.fit_one_cycle(6, lr_max=5.5e-5) and learn.fit_one_cycle(6, 5.5e-5) are the same. In general, Jeremy keeps it implicit as ‘lr_max’ is not required but to illustrate that ,NO, the learning rate is not constant he writes it explicitly this time.

When learning rate is taught on tutorials, it is constant yes but in reality, all optimisers adapt it with time, reducing it as we get closer to the minimum of the loss function.
The particularity of fit_one_cycle is that it progressively increases the learning rate until lr_max and then reduces it. The rational behind it is that initially, you start from random and ‘do not necessarily know where to go’ (so the learning rate should start small) and after a few epochs, you are more confident about what you are learning and can speed up (increase the learning rate). In the ultimate phase you should slow down again to not pass the minimum and avoid diverging.

You can also type in your notebook:

doc(Learner.fit_one_cycle)

Hope it helps !
Charles

Hi Charles thank you once again, your explanation plus the docs really helped, thank you for your time sir

Hi Charles would you happen to know the difference between get_image_files vs get_x

for reference, i am line 18 of this notebook https://github.com/shaythuram/fastbook/blob/master/06_multicat.ipynb

Hi, get_x is user defined in that case. It has been defined to return the file names in a given folder. Here it is fine to use it as we know the dataset is clean but if there was a text file in the folder, it would also select it and that would be a problem.
get_image_files is a FastAI function that you can reuse for other datasets, it basically return all the image files in the path you give to it (using the extension of the files).

so technically I could use the get_image_files here or get_x I’d just have to pass both of themthe path to my file?

I did try it however and I got the folllwoing errors