CNN Input normalization best-practice: featurewise or all of imagenet


I noticed on keras.image.ImageDataGenerator() that there are arguments for featurewise_center=Boolean and featurewise_std_normalization=Boolean.

In lesson 3, the weights were normalized by the entire imagenet database, but wouldn’t it be better to normalize based on the data for our particular problem? Is there any reason these arguments are not used in the generator in lieu of imagenet RGB averages?

I think the answer is that if we use pre-trained CNNs like VGG, we should use what they used to train them with. I think that is more true if you are recognizing the 1000 classes of things in Imagenet, like cats and dogs.

On the other hand, I have thought about this same question myself - what if I use a pre-trained VGG to recognize things that it wasn’t designed for (i.e. not things in imagenet) - like types of cervix’s in the Kaggle cervical cancer contest? At least with the NNs I have been trying so far it doesn’t seem to matter all that much for that kind of data.

Great question… and I am still thinking about it myself.

1 Like