Good rule of thumb for embedding for categorical data

Hi everyone, I’ve been using the tabular applications of fastai with good results since a few day. When you create a Tabular Learner, the default of fastai is to give you

min(600, round(1.6 * n_cat**0.56)) where n_cat is the number of values your categorical variable can take.

Are you familiar with other rules of thumb than can be used to choose a good embedding size?

1 Like

This might be of interest.