I have been training on my custom dataset for months and I just found out that I didn’t do any sorts of input normalization (in PyTorch, I did no normalization after ToTensor()). Could it be the cause that my network sometimes does not converge?
The problem is that I hope my network could work for different datasets (coming from different distribution), so I don’t know how to set a proper mean and std. What should be the best practice here?
Also my network is Resnet which supports Batch Norm. Is input normalization still important even if there is Batch Norm after every convolutional layer?
This is my first post in this forum and hopefully someone could provide me some generous hints Thanks in advance!
I don’t know if it really matters when working on pixels. I’ve seen all kinds of normalization schemes, including no normalization.
If you want to try it out,
pixel * 2 - 1 is a common one that puts all the pixels in the range [-1, 1].
So how important is it to put input pixels in the range [-1, 1] when batch norm is applied after all layers?
It probably isn’t important. But note that no batch norm is applied to the input, only after it has gone through at least one convolution layer. So you can think of normalization as batch norm on the input.
If the input is in the range [0, 255] then the first conv layer will have larger weights than the other conv layers. This may also result in very large mean and variance values in the batch norm layer following that conv layer. Is that a problem? It could be.
I think this is why we normalize the input, so that all conv and batch norm layers have weights that are roughly of the same magnitude.
I guess here you assume that pixels have already been standardized to [0,1] by dividing by 255.
If there has been no modifications beforehand then pixels are in the range of [0,255].
That depends on the library you use to read the images. Some already put the images into the [0, 1] range.
Yes exactly! In that case, shall I perform a final normalization step to make the pixels zero mean?
Try both ways. If you are not doing transfer learning ie your model learns from scratch then nobody can tell which one gives better results before trying.