Size of embedding for categorical variables

So it looks like it changed since I last looked. Previously, it was:

def emb_sz_rule(n_cat:int)->int: return min(50, (n_cat//2)+1)

Now it looks like it’s the following:

def emb_sz_rule(n_cat:int)->int: return min(600, round(1.6 * n_cat**0.56))

I’m not sure of the reason why, but you can dig into the source code here:

3 Likes