Can anyone explain me what does freeze and unfreeze do?

wiineeth · March 26, 2019, 2:31pm

im in the third lesson now and im confused on what will freeze and unfreeze do and when should we use them? Thank you <3

MadeUpMasters · March 26, 2019, 2:55pm

freeze and unfreeze effectively allow you to decide which specific layers of your model you want to train at a given time (I believe it does this by setting requires_grad to False to turn off training for that layer). I believe this is done because we often use transfer learning, and the early layers of our model are already going to be well trained to doing what they do, recognizing basic lines, patterns, gradients…etc, but the later ones (which are more specific to our exact task, like identifying an animal breed) will need more training.

unfreeze will unfreeze all layers of your model, so you will be training the early and later layers, although you still may be training the different layer groups at different learning rates. This is called ‘discriminative learning rates’ or ‘discriminative layer training’.

freeze will set all of your layer groups except the last one to be untrainable. It appears from the documentation that this means we freeze the first layer group (the one that comes from transfer learning) and unfreeze the second (also last) group, to train more.

If you know the details of your architecture and want to do something in between unfreeze and freeze you can use freeze_to(n:int) to specify which layer groups you want to freeze and which you want to train. The first n layer groups will be frozen and the last n layer groups will be unfrozen.

Maybe someone more experienced than me can help us out and answer.

What are some best practices/common patterns for using freeze and unfreeze with and without transfer learning?
What are the default settings? When I run a few epochs initially is it on all layer groups? Or just the last one? What about after I finish when I go to train some more?

Hope that helps!

selcuk · March 26, 2019, 3:17pm

For visual explanation and a bit more, Jeremy explains where and when freeze/unfreeze is applied for a typical fine-tuning/transfer-learning CNN model in fast.ai through the end of Lesson 4 and start of Lesson 5.
Same concept is applied to other types of models too.

The following links are lesson notes from Hiromi, and they also contain links to the relevant parts in the lesson videos.

github.com

hiromis/notes/blob/master/Lesson4.md#overview-of-important-terminology-13124

# Lesson 4

[Video](https://youtu.be/C9UdVPE3ynA) / [Lesson Forum](https://forums.fast.ai/t/lesson-4-official-resources-and-updates/30317)

Welcome to Lesson 4! We are going to finish our journey through these key applications. We've already looked at a range of vision applications. We've looked a classification, localization, image regression. We briefly touched on NLP. We're going to do a deeper dive into NLP transfer learning today. We're going to then look at tabular data and  collaborative filtering which are both super useful applications. 

Then we're going to take a complete u-turn. We're going to take that collaborative filtering example and dive deeply into it to understand exactly what's happening mathematically﹣exactly what's happening in the computer. And we're going to use that to gradually go back in reverse order through the applications again in order to understand exactly what's going on behind the scenes of all of those applications.

### Correction on CamVid result 

Before we do, somebody on the forum is kind enough to point out that when we compared ourselves to what we think might be the state of the art or was recently the state of the art for CamVid, there wasn't a fair comparison because the paper actually used a small subset of the classes, and we used all of the classes. So Jason in our study group was kind enough to rerun the experiments with the correct subset of classes from the paper, and our accuracy went up to 94% compared to 91.5% of the paper. So I think that's a really cool result. and a great example of how pretty much just using the defaults nowadays can get you far beyond what was the best of a year or two ago. It was certainly the best last year when we were doing this course because we started it quite intensely. So that's really exciting.

## Natural Language Processing (NLP) [[2:00](https://youtu.be/C9UdVPE3ynA?t=120)]

What I wanted to start with is going back over NLP a little bit to understand really what was going on there. 

### A quick review

So first of all, a quick review. Remember NLP is natural language processing. It's about taking text and doing something with it. Text classification is particularly useful﹣practically useful applications. It's what we're going to start off focusing on. Because classifying a text or classifying a document can be used for anything from:

This file has been truncated. show original

github.com

hiromis/notes/blob/master/Lesson5.md#review-of-last-week-332

# Lesson 5

[Video](https://youtu.be/uQtTwhpv7Ew) / [Lesson Forum](https://forums.fast.ai/t/lesson-5-official-resources-and-updates/30863) 

Welcome everybody to lesson 5. And so we have officially peaked, and everything is down hill here from here as of halfway through the last lesson. 

We started with computer vision because it's the most mature out-of-the-box ready to use deep learning application. It's something which if you're not using deep learning, you won't be getting good results. So the difference, hopefully, between not during lesson one versus doing lesson one, you've gained a new capability you didn't have before. And you kind of get to see a lot of the tradecraft of training and effective neural net. 

So then we moved into NLP because text is another one which you really can't do really well without deep learning generally speaking. It's just got to the point where it works pretty well now. In fact, the New York Times just featured an article about the latest advances in deep learning for text yesterday and talked quite a lot about the work that we've done in that area along with Open AI, Google, and Allen Institute of artificial intelligence.

Then we've kind of finished our application journey with tabula and collaborative filtering, partly because tabular and collaborative filtering are things that you can still do pretty well without deep learning. So it's not such a big step. It's not a whole new thing that you could do that you couldn't used to do. And also because we're going to try to get to a point where we understand pretty much every line of code and the implementations of these things, and the implementations of those things is much less intricate than vision and NLP. So as we come down this other side of the journey which is all the stuff we've just done, how does it actually work by starting where we just ended which is starting with collaborative filtering and then tabular data. We're going to be able to see what all those lines of code do by the end of today's lesson. That's our goal.

Particularly this lesson, you should not expect to come away knowing how to do applications you couldn't do before. But instead, you should have a better understanding of how we've actually been solving the applications we've seen so far. Particularly we're going to understand a lot more about regularization which is how we go about managing over versus under fitting. So hopefully you can use some of the tools from this lesson to go back to your previous projects and get a little bit more performance, or handle models where previously maybe you felt like your data was not enough, or maybe you were underfitting and so forth. It's also going to lay the groundwork for understanding convolutional neural networks and recurrent neural networks that we will do deep dives into in the next two lessons. As we do that, we're also going to look at some new applications﹣two new vision and NLP applications.

### Review of last week [[3:32](https://youtu.be/uQtTwhpv7Ew?t=212)]

Let's start where we left off last week. Do you remember this picture?

![](lesson4/18.png)

This file has been truncated. show original

Selçuk

cmvandam · August 1, 2019, 6:05pm

Sometimes I like to try to abstract away from the practical explanation of what a function or technique, such as freezing and unfreezing, does and how it works technically to building broader intuition surrounding why it works in principle. While I now understand from the explanations provided throughout the forum that the weights in earlier, pre-trained layers are being updated based in information used to train the latest layer, my colleagues and I were looking for an intuitive explanation that someone without knowledge of neural nets might be able to relate to. To this end we came up with the following human learning analogy that resonates with us, but I’d be interested if anyone might want to critique it or provide us with a better one. The analogy is as follows:

We accumulate knowledge over time as we observe and learn, and what we learn builds on what we have already learned before without having to relearn it (i.e. transfer learning). However, what we have learned in the past doesn’t go unmodified by what we are currently learning - new observations and training leads to new insights that may cause us to go back and modify or adjust (i.e. unfreeze) some of what we have learned earlier, potentially correcting, improving or deepening what we thought we knew, effectively seeing some of it with new eyes (i.e. fine-tuning some of the weights used to interpret earlier layers’ inputs, thus improving those earlier layers outputs). This improved understanding of what I learned earlier informs and improves my ability to learn new stuff (i.e. gives the new layer that I am training on better inputs to start with).

I can relate this to how I learned in my geometry / trig class by rote that the formulas for the circumference of a circle is 2pir, for the area of a circle is pir^2, and for the volume of a sphere is 3/4pi*r^3. However, when I took calculus the next year, I suddenly understood how they are all related, how they build on each other, and thus how they can be derived from the ground up using integral calculus. This deepened my understanding of what I had already learned, moved me beyond rote learning so that I it wasn’t so hard to remember the formulas, and maybe even allowed me to correct some formulas that I had gotten wrong to begin with.

Is this a reasonable analogy, or does someone have a better one? Perhaps this is obvious, but I do like thinking of ways to relate these things to how my brain works. Thanks for humoring me.

cmvandam · August 1, 2019, 6:34pm

Oops. Just noticed that my “volume of a sphere” formula contains a typo: should be 4/3 * pi * r^3. I suppose this is a case in point - that earlier layer of mine needs a little unfreezing and retraining.

qsa007 · August 6, 2019, 6:58am

@cmvandam Thats a really well worded write up. But in my opinion your analogy and the geometry/trig example misses the reason why we go back to unfreeze and retrain the initial layers. Here we do that when our later layer group doesn’t perform (recongnize shapes for our particular dataset) as well as we want them to be. So, we are just rebuilding our basic understanding of shapes to suit our particular dataset.

If you want ot think of this in terms of a real world analogy, probably how we sometimes struggle with a new mobile phone when we switch brands comes somewhat close. We might know the general working of the phone but how we change settings for a particular function could be different in this new phone from how it was in you older different branded phone. So we dig deeper to find that setting which is different in the new phone.

Hope I am not confusing you.

cmvandam · August 8, 2019, 5:55pm

Thank you very much, qsa007. Your explanation and alternative analogy is indeed very helpful. The issue I think you are pointing out with the geometry/trig analogy is that it implies that the earlier learning was somehow less complete or less correct than after it was unfrozen and retrained with the additional data. You are pointing out that this is not what is really going on. In fact, the unfrozen and then retrained layers aren’t necessarily any better per se, just better adapted to the problem at hand, i.e. not everything I learned about my iphone, especially the most specialized learning that typically happens later in the learning process, will be relevant and may even be somewhat misleading. I see now more clearly how not just unfreezing, but potentially even discarding, some of the more specialized / later layers of the model we are building on is therefore helpful. Thanks for taking the time to respond and help me out.

Alek · November 20, 2020, 3:16pm

I think driving a bike is a good analogy (assuming, of course, You have already learned how to do it). When You buy a new bike, You still need to adapt how to ride it first (it’s a different one), and after You do it, You can begin serious training- just raiding it without thinking about how to steer it.