Structured Learner For Kaggle Titanic

Hi guys,

Anyone attempted to use Structured Learner for Kaggle Titanic ( ?
Any tips on how to do it ?

I am stuck at trying to find lr as it didn’t manage to plot it out. Do I need to have a lot of data to do Structured Learner ?


I attempted that as well, but like you said I did not get a plot from lr find, I guessed that might be due to the size of the data, therefore I just set the learning rate to 1e-3 and started the training.

Did you manage to get a good prediction score ? Before this I was trying with keras + tensorflow

the best accuracy I manage to get till now using the fastai library is 0.78947, to improve that I believe I need to do more work on the feature engineering part.

This is a really small dataset< 1000 rows with 10 or so features. You will probably get good performance with a Tree based Model like Random Forest. It’s also a reminder that Deep Learning is not the hammer to every problem :slight_smile:


I had got that accuracy (~.80)doing data cleansing and then applying
Random Forest…
(Didn’t know abt fastai that time)

warning: noob alert

I gave titanic a try with ColumnarModelData too and in addition to lr_find() not working (presumably due to small data set?), I ran into another question that I hope someone is kind enough to help with.

That is, what’s the best idiomatic way to coerce the predictions into bools. I passed y_range=[0, 1] to get_learner and get float predictions. But this means that when I run fit, I don’t get an accuracy number even when I pass metrics=[accuracy] and my submission file contains floats instead of bools.

Of course I could hack the latter issue, but it feels like I might be holding it wrong. Here’s a gist of my notebook:

Any insight is greatly appreciated

1 Like

After further study, I think I’ve figured it out. metrics=[accuracy_thresh(.5)] does what I want during fit() and sub[dep] = sub[dep].round().astype('int') coerces the results for submission.

FWIW, I got an initial public score of 0.78947 by naively applying the same technique as the rossmann lesson.

As for the lr finder not plotting, this is caused by a too high batchsize. Adjust bs down and you ll get a plot

1 Like

From what I have seen in the kernels on this kaggle comp, most traditional machine learning methods achieve similar accuracy (~.8).
Hence it seems that the embeddings / NN approach is competitive.

Hi, I have been working on some classification data sets from kaggle, but my Structured learner model seems to be not performing as expected. Since I was more focused on the performance of the model, i used to borrow the feature engineering parts from an existing kernel in a kaggle competition and work on it.

But Random forest and Xgboost always tends to come in front of the ranks in slight margin, compared to the columnar nn module. For example: In 1.) Titanic kaggle, Xgboost =0.82 random forest =0.81, Structured learner =0.80, 2.) West Nile Virus Prediction kaggle, Xgboost =0.73 random forest =0.72, Structured learner =0.69

Is there any method that i can implement on my model to make it more efficient in generating the results? I am not able to figure out if this is happening because of any glitch in the input parameters given or is it due to the predictive power caused between continuous and categorical features.
Please find the code used.

class MixedInputModel(nn.Module):
def init(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops,
y_range=None, use_bn=False):
self.embs = nn.ModuleList([nn.Embedding(c, s) for c,s in emb_szs])
for emb in self.embs: emb_init(emb)
n_emb = sum(e.embedding_dim for e in self.embs)
self.n_emb, self.n_cont=n_emb, n_cont

    szs = [n_emb+n_cont] + szs
    self.lins = nn.ModuleList([
        nn.Linear(szs[i], szs[i+1]) for i in range(len(szs)-1)])
    self.bns = nn.ModuleList([
        nn.BatchNorm1d(sz) for sz in szs[1:]])
    for o in self.lins: kaiming_normal(
    self.outp = nn.Linear(szs[-1], out_sz)

    self.emb_drop = nn.Dropout(emb_drop)
    self.drops = nn.ModuleList([nn.Dropout(drop) for drop in drops]) = nn.BatchNorm1d(n_cont)
    self.use_bn,self.y_range = use_bn,y_range

def forward(self, x_cat, x_cont):
    if self.n_emb != 0:
        x = [e(x_cat[:,i]) for i,e in enumerate(self.embs)]
        x =, 1)
        x = self.emb_drop(x)
    if self.n_cont != 0:
        x2 =
        x =[x, x2], 1) if self.n_emb != 0 else x2
    for l,d,b in zip(self.lins, self.drops, self.bns):
        x = F.relu(l(x))
        if self.use_bn: x = b(x)
        x = d(x)
    x = self.outp(x)
    if self.y_range:
        x = F.sigmoid(x)
        x = x*(self.y_range[1] - self.y_range[0])
        x = x+self.y_range[0]
    return x

md = ColumnarModelData.from_data_frames(’/tmp’, trn_df, val_df, trn_y[0].astype(np.int64), val_y[0].astype(np.int64), cats, 64, test_df=df_test)
model = MixedInputModel(emb_szs, n_cont=len(df.columns)-len(cats), emb_drop=0, out_sz=2, szs=[500], drops=[0.5],use_bn=True).cuda()
bm = BasicModel(model, ‘binary_classifier’)

class StructuredLearner(Learner):
def init(self, data, models, **kwargs):
super().init(data, models, **kwargs)
self.crit = F.mse_loss

    learn = StructuredLearner(md, bm)
    learn.crit = F.binary_cross_entropy

My observations are also similar in kaggle competitions. .All things kept equal(sampling,feature extraction etc)my Structured learner are almost never able to beat the tree based models.I guess this some how indicates that NN is not a solution to every problem or that “There are no free lunches”. .I particularly noticed this to be the case in unbalanced class based classification problems.My Tree based models(Xgboost,RF) have almost 4-5% higher AUC than Neural nets using embeddings.Has anyone experienced superior performance of Neural nets than tree based algos in classification problems?

I’ve also been trying to process the Titanic kaggle dataset. However, when I get to the part where I try and run a prediction on the test dataset I get an error.


RuntimeError Traceback (most recent call last)
in ()
----> 1 pred_test=m.predict(True)

~/fastai/courses/dl1/fastai/ in predict(self, is_test, use_swa)
355 dl = if is_test else
356 m = self.swa_model if use_swa else self.model
–> 357 return predict(m, dl)
359 def predict_with_targs(self, is_test=False, use_swa=False):

~/fastai/courses/dl1/fastai/ in predict(m, dl)
220 def predict(m, dl):
–> 221 preda,_ = predict_with_targs_(m, dl)
222 return to_np(

~/fastai/courses/dl1/fastai/ in predict_with_targs_(m, dl)
231 if hasattr(m, ‘reset’): m.reset()
232 res = []
–> 233 for *x,y in iter(dl): res.append([get_prediction(m(*VV(x))),y])
234 return zip(*res)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
–> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)

~/fastai/courses/dl1/fastai/ in forward(self, x_cat, x_cont)
116 x = self.emb_drop(x)
117 if self.n_cont != 0:
–> 118 x2 =
119 x =[x, x2], 1) if self.n_emb != 0 else x2
120 for l,d,b in zip(self.lins, self.drops, self.bns):

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
–> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/ in forward(self, input)
35 return F.batch_norm(
36 input, self.running_mean, self.running_var, self.weight, self.bias,
—> 37, self.momentum, self.eps)
39 def repr(self):

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/ in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
1011 raise ValueError(‘Expected more than 1 value per channel when training, got input size {}’.format(size))
1012 f = torch._C._functions.BatchNorm(running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled)
-> 1013 return f(input, weight, bias)

RuntimeError: running_mean should contain 2 elements not 1

Anyone have any ideas? For a bit of background I just adapted the Rossman example from dl1 courses. I’ve been able to build a model so I’m not sure what I’m doing wrong.

If it’s helpful here is my notebook on github.

Are your md.trn_ds.conts and md.test_ds.conts the same shape?


@devale I also had this issue, and following the suggestion for checking the shape of md.trn_ds.conts and md.test_ds.conts (thanks @Jan) I realised that they were indeed not the same shape.

I found that my training set and test set had different number of columns that contained NA values, so once I had sorted that out the problem was resolved.

I noticed the same thing and tried (unsuccessfully) fixing it by using the na_dict from test_df in my proper df. I worry that my problem is that I don’t actually understand what the NA values are, and have been unable to google it. Could you explain?

Glad I could help.

The best I’ve achieved to date is to get my NN’s performance equal to my tree-based performance.

1 Like

Yeah!And add to that the cost of using a GPU based ecosystem and migrating to a different product environment sometimes makes you think how much value will it have in the current business scenarios(Non vision and Non NLP).

@Jan and @mfosker I’ll give that a roll when I get home later. Thanks for the tip!