OK, I finally had a chance to apply your alternative approach to binary classification to a real dataset, @PranY.
I have a very basic titanic kaggle notebook.
I used a fixed seed to make sure I’m comparing apples to apples:
torch.manual_seed(40)
random.seed(40)
Once the data is ready I run it through 2 different approaches, with only differences in a few arguments listed for each entry as its title:
Approach 1: is_reg=False, is_multi=False, out_sz=2 (crit: nll_loss)
md = ColumnarModelData.from_data_frame(PATH, valid_idx, train_proc_df, y.astype('int64'), cat_flds=cat_vars, bs=32,
is_reg=False, is_multi=False, test_df=test_proc_df)
m = md.get_learner(emb_szs=emb_szs, n_cont=(len(train_proc_df.columns)-len(cat_vars)), emb_drop=0.04, out_sz=2,
szs=[1000,500], drops=[0.001,0.01], y_range=y_range, use_bn=False)
lr = 1e-3
m.fit(lr, 1, metrics=[accuracy, f1, precision, recall])
m.fit(lr, 2, cycle_len=2, cycle_mult=3, metrics=[accuracy, f1, precision, recall])
preds = np.argmax(m.predict(True), axis=1)
Approach 2: is_reg=False, is_multi=True, out_sz=1 (crit: binary_cross_entropy)
y = y.reshape(len(y),1)
md = ColumnarModelData.from_data_frame(PATH, valid_idx, train_proc_df, y.astype(np.float32), cat_flds=cat_vars, bs=32,
is_reg=False, is_multi=True, test_df=test_proc_df)
m = md.get_learner(emb_szs=emb_szs, n_cont=(len(train_proc_df.columns)-len(cat_vars)), emb_drop=0.04, out_sz=1,
szs=[1000,500], drops=[0.001,0.01])
lr = 1e-3
m.fit(lr, 1, metrics=[accuracy_thresh(0.5)])
m.fit(lr, 2, cycle_len=2, cycle_mult=3, metrics=[accuracy_thresh(0.5)])
preds2 = m.predict(True)
preds2 = np.concatenate((preds2>0.5)*1)
I had to comment out in column_data.py
for this to work:
#if is_reg==False: assert out_sz >= 2, "arg is_reg==False (classification) requires out_sz>=2"
Results:
Comparing approaches 1 with 2:
(preds==preds2).mean()
0.9521531100478469
pretty close! And submitting both to kaggle, surprisingly both received the same score of 0.77511
So to me it looks that your approach works similar to the other method, @PranY.
It’d be a very good idea to try it with another, perhaps much bigger dataset. Anybody has a binary classification notebook with a bigger structured dataset that we can try this on? And ideally where we know the correct predictions.
Also could you please check that my prediction code is correct in the 2nd case?