I have been having some difficulties getting good results on 2 different problems, one on a regular tabular dataset and another using collaborative filtering type models, all which end up calling
TabularModel. The model , uses
layers.embedding which has a default std=0.01  and references a paper based on word embedding inits .
I’ve gone through part 2 (2019) lessons, and the first couple of lectures mention using inits for layers that keep us close to mean/std near 0/1. I’m not sure what is applicable to the input layer, but typically we normalize linear features to N(0,1), so shouldn’t we do the same for the embedding dimensions? I haven’t done enough testing to be test this out thoroughly, has anyone found this to be a problem?