What is the reason behind adding 1 to embedding size while creating cat_sz?


(Yeshwanth Reddy) #1

In lesson3-rossman notebook we have a couple of lines of code that create (category-size, embedding-size)

cat_sz = [(c, len(joined_samp[c].cat.categories)+1) for c in cat_vars]  
emb_szs = [(c, min(50, (c+1)//2)) for _,c in cat_sz]

What is the reason behind adding 1 in [(c, len(joined_samp[c].cat.categories)+1) for c in cat_vars]?


(Yeshwanth Reddy) #2

I’m guessing 0 is reserved for unknown levels, but still not sure


(Kobe430am) #3

I could understand why c is added 1 in cat_sz, which is to reserve for unknown levels as you mentioned. But when calculating emb_szs, another 1 is added. This I don’t understand why, and I suspect is just a hack to get half the embedding size. I originally thought the notebook contained a small mistake.