Need a Clear explanation on how normalizing inputs helps learning

saiprasanna · April 16, 2017, 2:07pm

If we subtract the inputs by mean of test data and divide by standard deviation do we loose some information?
And lets say I have 10 inputs to train the neural net, and I have normalized the data with the mean and variance of these 10 inputs. Now there is no guarantee that the inputs for me to predict with this trained network will have same distribution right? Any example with limited number of neurons and some actual numbers on how normalizing helps would be helpful.

Rothrock42 · April 16, 2017, 2:47pm

My understanding is that it is a technique to help the learning algorithm – SGD or the others – find a speedier solution. It does this by making the feature space more regular. The example I got was from Andrew Ng’s course. In his example lets say we are doing something where we are trying to predict housing prices based upon just two features number of rooms and square feet. He only uses two features because we can easily plot and visualize it. In that case one set of features ranges from 1 to 5 (maybe) and the other ranges from 800 to 3,500.

When the cost function is plotted for this data set, it would have a very skewed shape with a “valley” this means the learning algorithm will end up having to jump back and forth from side to side and will descend more slowly than if the features had been normalized and the contour plot was more round.

With images it might not really be necessary in general to normalize since they all have a range of 0 to 255. However, in the case of vgg16, we are using pre-trained weights where they were trained with normalization. So that is why we need out inputs to resemble that original training set.

As for your example of 10 samples for training your net. I guess that is why most training sets are larger than 10. The point is to have enough features from the set of things you’re trying to train on to capture the general shape of the “function” that defines that set of things. Stretching and squashing that function shouldn’t affect the ability to model the function. At least that is my understanding of it.

saiprasanna · April 17, 2017, 3:22am

Thanks, Any link to andrew ng talk on normalizing inputs?

Rothrock42 · April 17, 2017, 2:36pm

Its part of the Coursera Machine Learning Standford course.https://www.coursera.org/learn/machine-learning

Its also a free online course and focuses mostly on the math behind deep learning. It is a few years old, but the match hasn’t changed all that much.

The other thing I recommend is that you listen again to when Jeremy talks about Batch Normalization. The same ideas about normalizing inside the neural net to help keep any one signal from dominating also apply to the inputs.