Interesting approach @marii - I usually recompute stats for all channels and normalize the training set as usual during databunch creation with .normalize()
. I agree with you about the importance of the “nature” of images.
You can leverage the imagenet normalization only if you’re going to use “similar” images (same number of channels, similar luminance and a wide variety of subjects).
If you’re changing the number of channels, or the subject (ie: training on charts), you need to normalize the data according to your distribution.
You can think pretrained weights as a very good initializaiton for your network; so, especially when you’ll unfreeze the first layer, it’s important that you train your network with normalized images.
Moreover (on resnet architectures), the effect of input normalization matters more on the first layer because after that there is a BatchNorm2d that balance the output after the first convolution:
Sequential(
(0): Sequential(
(0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
...
Train on MNIST with bad normalization:
bad_stats = ([.01,.3,.9], [.7,.01,.73]) # random values
data = ImageDataBunch.from_folder(untar_data(URLs.MNIST_SAMPLE),bs=128).normalize(bad_stats)
actsh = partial(ActivationsHistogram,liveChart=False,modulesId=range(6),hMin=-10,hMax=10,nBins=200)
learn = cnn_learner(data, models.resnet18, callback_fns=actsh, metrics=[accuracy])
learn.unfreeze()
learn.fit_one_cycle(4)
learn.activations_histogram.plotActsHist(cols=6,figsize=(20,2),showEpochs=False)
Train on MNIST with auto normalization:
data = ImageDataBunch.from_folder(untar_data(URLs.MNIST_SAMPLE),bs=128).normalize()
As you can see, even if the output of the first convolution is so different, after BatchNorm2d the activations back to normality with or without input normalization.
You can find colorfuldim.py and the documentation here.
NOTE: When you change the model to accept a number of channels different than 3, you’ll reset the weights for that layer (usually the first one); so It’s useful to “initialize the weight” with something meaningful (ie: copying from other channels RGB -> RGBRGB) before to unfreeze them.
See def adapt_first_layer(src_model, nChannels).