Lesson 1: Head of a Model specifics

mckennabrown · August 24, 2020, 8:31pm

Hi, I’m not quite sure if this should have been posted as a response to the Lesson 1 discussion page, or the Lesson 1 questionnaire, please let me know if I should repost as a reply there.

From my understanding, using transfer learning on a pretrained model using cnn_learner will always remove exactly 1 layer from the end of the old model. This last layer doesn’t have a particular name.

Then it will add 1 or more layers to the end that have random weights assigned, and this set of layers is called the head. From stack overflow I understand you can have multiple heads if you want multiple outputs. How do we know how many layers get added, and is that even an important question?

Is my understanding correct? It’s possible I’m getting too bogged down in the specifics early on.

Pomo · August 25, 2020, 12:08am

Hi McKenna,

Many of your questions will be gradually answered as you progress through the lessons and use fastai yourself. That’s the fastai teaching philosophy - use it first and later learn the internals as you need to. At this point, it’s fine to trust that fastai is creating a reasonable model and training process for your task.

That said, it sounds like you want to understand a more about what goes on behind the scenes. And I am answering only because I’m the same way and also felt quite lost at the beginning!

These various models are generally structured first to generate a huge mass of features, which are then reduced to class probabilities. We roughly call these parts of the model the body and head. Models that are pretrained classify into a specific number of classes. fastai strips the head off, leaving the body to generate the features it has already learned are relevant. It then attaches a new trainable head that classifies instead into the number of classes your data shows.

The existing head is usually more than one layer. There’s no magic for finding the cut between body and head - it is typed in by a smart human. The new head that gets attached (several layers) was designed to be effective by the authors of fastai, based on years of experience.

You can look at the new head’s structure with learn.model and see how it processes data with learn.summary. To see exactly how the new pretrained model is made, look at or trace the source code for create_cnn_model.

Yes, you are probably too bogged down in specific details early on. And I get it. HTH to to get you oriented and to relax into the process. And please keep asking questions.