Logic of "_load_pretrained_weights"

Fabian · April 11, 2021, 4:04pm

I want to process medical images and use more than 3 input slices for this.
I was pleased to see that fastai already implements a way to do this quite easily by specifying “n_in”.

I think it would be great, if I could use pretrained weights for this. Fastai also does this.
It is done here:

github.com

fastai/fastai/blob/master/fastai/vision/learner.py#L33


# Cell
def _get_first_layer(m):
    "Access first layer of a model"
    c,p,n = m,None,None  # child, parent, name
    for n in next(m.named_parameters())[0].split('.')[:-1]:
        p,c=c,getattr(c,n)
    return c,p,n
# Cell
def _load_pretrained_weights(new_layer, previous_layer):
    "Load pretrained weights based on number of input channels"
    n_in = getattr(new_layer, 'in_channels')
    if n_in==1:
        # we take the sum
        new_layer.weight.data = previous_layer.weight.data.sum(dim=1, keepdim=True)
    elif n_in==2:
        # we take first 2 channels + 50%
        new_layer.weight.data = previous_layer.weight.data[:,:2] * 1.5
    else:
        # keep 3 channels weights and set others to null

When multiple slices are used, it uses the pretrained weights for three channels, while setting the weights of other channels to zero.

Intuitively, I would just copy the weights over to all channels (e.g. tripling the channels to 9 channels) and divide this result by 3 (so sums stay the same for the next layer).

Can somebody explain to me why it isn’t done that way?
Is there something I am missing?

Thank you for your response in advance.

Best,
Fabian

Pomo · April 20, 2021, 6:37am

Hi Fabian. I was hoping that someone would pick up on your question so that I could learn something new and also not expose a risky opinion. But I did not want to let your good question go without a reply. I have had the same question.

The short answer is that I don’t know why this design decision was made in fastai. I think that your idea of copying the weights and scaling them is entirely reasonable. You can find code examples in these forums that do exactly that. I would encourage you to try your idea and see how well it works vs. initializing with zeros. Maybe your idea of copying the weights will prove to be a better method that can be offered as an improvement.

My main issue with the fastai codebase is that you can read the code to find out what it does but not why. Which other designs were explored or tested to arrive at the many particular choices? Many options may have been evaluated, many years of personal experience drawn from, but you would never learn it from the code. So I do not think there is “something you are missing” - it’s just that you will not often find that “something” explained in the code itself.

Of course these are only my personal observations of the codebase, and I have a deep appreciation for the overall clarity and ease that fastai brings. Good luck with your project!

Fabian · April 21, 2021, 9:40am

Hello Malcolm,

Thank you for your answer.
I agree with you. To use FastAI to its full potential it is essential to basically also understand most of the library itself. Otherwise, modifications are tough to pull off.

I generally read the source on Github (or within Jupyter notebook) for this.
Is there a better representation of the FastAI source code I could read?
I heard that it’s based on Jupyter notebooks itself. Maybe these notebooks have better comments with respect to the code?

Thanks,
Fabian

Pomo · April 21, 2021, 6:16pm

Hi Fabian. I am really not very familiar with domain of fastai development. The fastai code is indeed developed and exported from Jupyter notebooks. I went to a couple of the source notebooks when I wanted more understanding of the code. There were unit tests, but no explanations of the reasoning and choices behind the designs. Sorry I can’t help further. Maybe someone else will have better advice.