I am having problems with my Batchnorm layers. My input are vectors of length 1024 and they contain either zeros or ones, and are highly sparse, maybe 90% bits are 0 within a vector. Using batchnorm in the early layers seems to explode my loss. I am thinking that it might be because of the sparseness of the data, the batchnorm cannot adequately learn the distribution of activations.
Would that be a fair conclusion?
I am not sure about your conclusion but the input should definitely be normalized.
But do you normalize “categorical variables”? And how would one do that
DataBunch.normalize() should do it for you I think.
If want to normalise it without the
DataBunch then you just do it the same way you would with any other input:
It is easier for neural networks to handle inputs with a mean of zero and std of one. Although it doesn’t matter much for binary inputs. Sometimes one hot encoding it is better when there are many classes.
You should try normalising it, one hot encoding it and compare that to your current results.