I’m curious about whether
learn.lr_find() should be used before or after unfreezing the layers of the network.
In the first lesson’s notebook, Jeremy runs
lr_find() before unfreezing the network and then uses those learning rates to fine tune the network.
I’ve plotted the
lr_find() plots for before and after unfreezing and they look slightly different.
I’m wondering which would be the right approach for fine-tuning?
Your 2 plots are different evidently because the weights in your last layer have changed.
I think lr.find() can be used in both cases.
Before unfreezing, run lr.find() and pass a fix value of lr because it will affect just the last layer.
After unfreezing, run lr.find() again and pass a slice of lr because you don’t want to change too much the very beginning layers.
Hope that helps
I’m sorry if I was not clear. The steps I followed were:
- Initialize a
- Train for 5 epochs using
lr_find() and get the plot
lr_find() and get the plot
lr_find() change the weights of the network (cc @sgugger)? If not, then I don’t see how the weights in the last layer would have changed.
I think for fine-tuning it makes sense to
unfreeze() and then run
Ah, I was misunderstanding your process. But anyway, I think the 2 plots are different:
When you call
lr.find() , before unfreeze, you let the weights in only the last layer change. Its loss function will act differently compare to the after unfreeze case. In that case, all the weights allowed to be changed.
(Actually, in the first batch your 2 models are the same. But after some batchs, the weights are totally different )
Imagine lr.find() is similar to train your model in one epoch. The different is it will stop when the loss increase drastically.
Even if you have the exact same network, you will probably get different curves for
lr_find since the training is always random (we shuffle the data in batches).
lr_find doesn’t change the weights: it saves the model before doing anything else then loads it back at the end.
In this case, you should probably use
lr_find after unfreezing, and you can even use differential learning rates while running
When you use lr_find Jeremy always mentions that we need to see the chart to find out he correct slice for the next step.
Is there any way to obtain that range from the lr_find function? Or somewhere else?
You might wanna have a look at this thread: LR finder for fine-tuning
I am wondering whether we could use the lr_find result before unfreezing can tell us the LR for the later layer, and lr_find result after unfreezing can tell us the LR for the earlier layers.
In other words, say you run LR finder without unfreezing, and it says things start to get worse at 1e-4. You unfreeze and run LR finder again and it says things get worse around 1e-5. Using Jeremy’s “pick LR way well before things gets worse for earlier layers”, we would pick
Just a gut feeling and not based on ay experiments. What do you guys think?
so you mean to say let’s use both the results from how they came from to optimize for the best? Sounds interesting.
We should try it out and see how it goes
It would definitely be interesting to see. Do keep us updated on any results that you find.
From what I understand, I don’t think there should be a big difference. Additionally, the graph shown from
lr_find() is only for the last layer.
What is really neat is that you can pass discriminative learning rates to the LR finder:
I am trying to train a classifier on a bunch of images (144 images in 4 categories. )
after running fit for 4 epochs, and then (before unfreezing) lr_find
that looked like a good range for the unfrozen would be 1e-3, 1e-2
but after trying that i get
next attempt, more images.
First of all I just want to say that Im very interested to hear how much better results you can get with these methods. I believe that it shouldn’t make huge difference. But I also need to ask question about this graph above. I sometimes got same kind of shapes and I think I have the same problem of small amount of data but the question is that is it always too small train set or can we get this kind of graphs in some other cases too?
This is my experience:
I experimented with 2 sets of training rate based on lr_find() output: One with a higher learning rate (based on without unfreeze()) and another with a lower learning rate. I continued training .fit_one_cycle() for the same number of epoch. Eventually both sets of learning rate resulted in very similar error_rate.
Have the same issue.
When I just load imagenet weights, after lr_find() plot looks more like on course site (decreasing and then jumping infinity) but I always got similar plots as yours (straight and then infinity) after first learn.
I think that mean, network is already finetuned, and won’t easily learn more.
There was an example that network is a multidimensional function of valleys and hills, and we want to find the lowest valley, so I used two approaches:
- learn again with slightly different parameters (restart all and find better valley)
- learn unfreezed with lr near to the left side of the plot (correct valley, go to its bottom)
I tried both and usually, it progress a bit (1-5%)
For more, I think you need to check out your dataset, model size, bach size etc.
Edit: As in lesson3 for second step Jeremy choose
max_lr = slice(a/10, b/5) where a last value on not-increasing-line and b is value on previous lr_find plot.
Seems to be better.
learn.lr_find() yields non-deterministic results, how should we choose the range for
lr? One time running it shows that
1e-04 gives a low error, while another says that
1e-03 is much better. Is there really a best range for