What to do with batch-normalization layers while fine-tuning the model

Hi everyone,this is my first post and hopefully getting some help,

I’ve been thinking about this for a long time still can’t not get it together…

If I want to freeze the layers from pre-trained model(0 to t layer) and only fine-tuning the (t+1 to end),what should we do to batch_normalization layers while fine-tuning the model?

However layers 0~t contain batch-normalization layers which if you don’t care that ,it would cause some problems.

I’m currently only using keras and it provides two ways:
(1) set the layers(batch) to not trainable ,which I have already tested ,it actually freeze everything,beta,gamma,moving mean,moving variance

(2)x = BatchNormalization()(y, training=False) which would result only freeze moving mean,moving variance and beta,gamma would still be updated while training.

To sum up,
My own dataset is not big but is similar to the ones used for pre-trained model,So I think it’s better to freeze the pre-trained model(first couple layers) ,but what should I do to batch normalization layer ?

Is it to freeze everything or just freeze the moving mean&variance and keep beta,gamma changing over training ?If explanation can be given,I would really be thankful to it.

Thanks in advance!

I do not know the answer to your question, but some people share there experiences and thoughts regarding a related question on the following chat: Freezing batch norm

1 Like