I was trying the following and was wondering why the accuracy gets worse after further training with an unfrozen model.
I would think that the accuracy would not go back to worse levels after unfreezing?
Thanks for your thoughts on this
Your learning rate is too high, fastbook uses slice(1e-6, 1e-4), you use 10e-4, which is 1000 to 10 times higher than the learning rates used in the book.
What you are seeing is an awesome example of using too high a learning rate during fine tuning. This high learning rate causes the frozen layers to forget much of what they have seen previously once we unfreeze them and train. When applying transfer learning to images, we expect the earlier layers to contain useful building blocks to be used with little change, such as recognising edges, so we want them to be less sensitive to mistakes than higher layers (ie a lower learning rate)
Fastbook gives the first layer a learning rate of 1e-6, this is 1,000 times less than the learning rate you are using on that layer (10e-4 which is 1e-3). This is an assumption on my part, but I am assuming this learning rate difference is forcing the model to forget much of what it already knows that is useful for your task and replacing it with knowledge that isn’t.
Thanks a lot. Yes that makes a lot of sense. I had used the lr_find() method after unfreezing the model, but it looks like I missed on order of magnitude there.
You can also use lr_find() again after unfreezing the model. This time you will get a different lr as the layers are unfreezed now . Then you can use fit_one_cycle() with the new lr.