I am having problems with my Batchnorm layers. My input are vectors of length 1024 and they contain either zeros or ones, and are highly sparse, maybe 90% bits are 0 within a vector. Using batchnorm in the early layers seems to explode my loss. I am thinking that it might be because of the sparseness of the data, the batchnorm cannot adequately learn the distribution of activations.
DataBunch.normalize() should do it for you I think.
If want to normalise it without the DataBunch then you just do it the same way you would with any other input: (input-mean)/std
It is easier for neural networks to handle inputs with a mean of zero and std of one. Although it doesn’t matter much for binary inputs. Sometimes one hot encoding it is better when there are many classes.
You should try normalising it, one hot encoding it and compare that to your current results.