Lesson 3 technique on Boolean value


I am trying to apply lesson 3 on the kaggle titanic, just wondering whether I can use for Boolean (0 or 1) type values? And how? Notice lesson 3 are sales float

1 Like

0 and 1 are perfectly reasonable float values :slight_smile:

Ah thanks, quick question on the following part
df, y, nas, mapper = proc_df(new_train_data, ‘Survived’, do_scale=True)
yl = np.log(y)
I got some error RuntimeWarning: divide by zero encountered in log

I am not sure how should I go around that, should I treat yl as y that has value 0 and 1?

You don’t need to take the log. We only did that because in that particular competition the eval metric used log.


Another quick question, what does the following code do ?
cat_sz = {c: len(joined_samp[c].cat.categories)+1 for c in cat_vars}
emb_szs = [(c, min(50, (c+1)//2)) for _,c in cat_sz.items()]

especially emb_szs ?

We’ll be covering that tonight! :slight_smile:

Ah cann’t wait ! thanks