Fastai Resnet50 Final Layers

Chris-hughes · April 11, 2019, 5:03pm

Hi All,

I’m not sure if this is in the right place but please let me off as this is my first post . I have completed parts 1 & 2 of the 2017 Keras version of the course and am now working my way through the 2019 version, so if Jeremy speaks about this in a later lesson, please direct me there!

When using the Fastai library to create a Resnet50 architecture for transfer learning, such as in lesson 1, there seems to be a series of additional layers (Batchnorm, Dropout, Linear) appended to the end of the model that do not appear in the Pytorch or Keras versions.

I was wondering, outside of the obvious, what is the purpose of these final layers and where are they defined in the fastai library? Also, how are the weights of this layers initialised, was the whole fastai model trained on imagenet? Finally, I am still getting to grips with pytorch so apologies if this is obvious, but does the final linear layer use a softmax activation?

Does Jeremy ever talk about this in the 2018 or 2019 course, or does anyone know anything about it?

Thanks!

Pomo · April 11, 2019, 5:57pm

Welcome! Here’s a brief orientation:

Pretrained resnets were trained on ImageNet. They take an image and output probabilities for 20000+ classes. The final layers of the model map the processed image into the 20000 classes.

cnn_learner strips off those final layers to get to the pre-trained vision “body”. Then it creates and attaches a different “head” that maps the processed image to your classes. For example, into 33 dog breeds or into 2 dogs vs. cats. Those classes are specified implicitly in the DataBunch. The particular structure of the head reflects Jeremy’s experience and insight into best practices.

You won’t see the softmax activation in the model itself. Take a look at learner.loss_fn to see how it is applied before calculating the loss. When you make predictions, for example with learner.get_preds(), fastai automatically applies an appropriate activation function and returns probabilities.

To really understand how all this works in detail, you’re going to have to read fastai documentation and code. Code is accessible via links in the fastai docs. But I highly recommend using a debugger to trace through cnn_learner. VSCode, PyCharm, and PDB are some options.

Good luck with your explorations!