Def_emb_sz is wrong in fastai/tabular/data.py

IIRC, Jeremy’s rule of thumb for embedding sizes is min(50, (cardinality of a category + 1)//2).
For example, if a category is composed of 3 variables, then the corresponding embedding size should be min(50, (3 + 1) // 2) = 2. Correct so far?

However, the current implementation of def_emb_szs will return the cardinality of a category because dict.get returns the second argument only if the key does not exist in the passed dict.

My solution
IMO, sz_dict is not necessary. So I want to reduce the arguments to (df, n) from (df, n, sz_dict).

Additional
I known this works when invoke get_tabular_learner w/o emb_szs dict, though the example code in docs passes emb_szs dict.

1 Like

The trick is to only use sz_dict to pass sizes where you don’t want to use the default. :slight_smile:

So how about editing docs?
It uses sz_dict and is confusing for me.

I’m sure it wasn’t intended, but this comes across as rather unpleasant. We’re working very hard to help you, and receive no financial gains in return. So we do appreciate some gratitude, or at least patience and understanding.

The docs are correct. They are using sz_dict in the way I described - to override the defaults.

1 Like

OK, thank you!