I have a fundamental question around scaling of input features. If Norm layer is used then do we still need to scale input features or would that be redundant.
As per my understanding: Yes you should be normalizing the input features. Norm Layer normalizes across all activations including the un-normalized activations.
The objective of the Norm Layer is different normalizing the input features.
Input features are normalized so that features do not dominate.
Yes you are right about preventing any particular input feature’s dominance over the others. However the output of a certain layer (activation ) is what goes into the subsequent layer. And Norm ( batch, layer etc.) scales it. So would it not automatically take care of the unscaled input. Hope i have been able to frame my question clearly
An activation is calculated via the following:
a = activation_func(Wx) Wx = w1.x1 + w2.x2 + ....wn.xn
The norm layer will normalize
a, but the dominating features will already have contributed to the activation. Both normalizing techniques have different effects and contribute in different ways.
Normalized input make sure that no one feature is dominating in the calculation of
Norm Layer I believe normalizes
a wrt to other activations.
Agreed. So yes it makes logical sense to have the input features scaled. Thanks for the explanation