Structured Learner For Kaggle Titanic


(Deborah Tylor) #22

Here’s my attempt using a modification of Jeremy’s lesson3-Rossman https://github.com/dtylor/dtylor.github.io/blob/master/kaggle/titanic/titanic_nn.ipynb. My submission score was pretty average at 77%. I tried also using the generated embeddings in a random forest regressor and achieved the same score.


Structured Learner
(Dan Goldner) #23

@dtylor - Thank you for posting this. Beginner question, if you’re willing:

You got

Accuracy of 87% using the embeddings calculated by the nn in the Random Forest Regressor.

But I couldn’t see in your gist how the nn-calculated embeddings got into df. df is passed to ColumnarModelData.from_data_frame() and therefore is available in md. Then m is created by md.get_learner(), and the m.fit() is called. You then use df directly (well, converted to numpy) as input to the random forest.

The embeddings must added to df and their values set during training … does all that happen in-place in the df dataframe?

Thanks!


(Dan Goldner) #24

@shub.chat I think there’s potential in trained embeddings that you can’t (?) get from trees. Am I missing something?


(Deborah Tylor) #25

Thanks for reading and your feedback. I am a beginner at this as well. The 87% accuracy of the random forest was based on the validation set, but the test prediction submission to Kaggle produced the exact same score as the neural net produced submission of 77.033%. The validation set wasn’t randomnly selected but represented the last 90 rows in the training set (a carryover from the time based selection for the rossman example), which may explain why it wasn’t representative of the test set.

You are correct; the code wasn’t properly using the embeddings from the nn for the random forest (which I still would like to try if possible). I’ll correct the comments.
Thanks again!


#26

There is certainly potential but as per what I have observed so far the potential is limited .The overall incremental benefit I observed specific on tabular data based classification problems was almost negligible.This can actually be a really great research area.We pick up all the old classification problems on kaggle and try and check if ANN using embeddings provide benefit and if yes how much?I still feel Deep neural nets are not a panacea to all problems specific to tabular data.


#27

How did you do the cleaning?

Right now I’m working on this contest, my score using random forest is 0.74.

What hyperparameters did you use in the random forest?


(Karl) #28

I’ve been playing around with different methods for the Home Credit Default Risk Kaggle competition. With everything I’ve tried, boosted tree models have about 3-4% improved performance over fastai neural net models. I’ve tried playing around with different levels of dropout, adding layers, changing embedding matrix sizes, processing data in different ways and different training strategies. Optimizing these factors gets around 0.2-0.5% improvements, which isn’t going to close the performance gap much. To your point about unbalanced classes, this competition has a severe imbalance in training data, which may hurt neural net performance.

That said, my fastai structured data model outperforms other posted neural net solutions implemented in Keras/Tensorflow.


#29

That’s interesting.Do you have any minimum number of records for which NN should work.I was trying it for Santander value prediction which has ~4500 training data points.The results are quite bad.


(Karl) #30

I don’t really know, but I would guess a lot. For example the Rossman challenge where NNs worked well had over a million rows in the final processed data set. The Rossman data also contained nonlinear features like time/seasonal relations to sales, which a NN should be better at understanding.

I think the Santander competition is particularly poorly suited to deep learning because all you have to go on is a tiny amount of sparse data.

Something I have been interested in trying for the Santander challenge is training an embedding matrix on the data (similar to lesson 5), then using the learned matrix to transform the data before passing it to a random forest/GBM. My hope is that the embedding matrix will learn latent features in the data, then pass them on to models better suited for small data sets, but there’s still the problem of having only a tiny amount of data to go on.


(Sagar) #31

I attempted with Gboost Random Forest and Lr.
I got best score with RF of .78


#32

Sounds really cool!Will really like to know how your experiment goes on this.


#33

Hi Karl

would you mind sharing how you modified the Rossman code to work with a classification problem
thanks a ton


(Karl) #34

So this is what I have going right now

I wouldn’t call the model working. It doesn’t really train, and I think it’s just converging to zero given how 92% of the test set is a single value. To use the structured data model for classification, I just used what was done in this notebook:


#35

thanks Karl


(Pierre Guillou) #36

Hi @dtylor,

I’m trying to use f1 metric in m.fit(lr, 3, metrics=[f1]) but it gives an error.
Did you try it in your notebook ?


(Stas Bekman) #37

The problem you were getting came from different shapes of targs (1) and preds (2), plus preds were log()'ed.

in metrics.py, add:

def recall_torch(preds, targs, thresh=0.5):
    pred_pos = torch.max(preds > thresh, dim=1)[1]
    tpos = torch.mul((targs.byte() == pred_pos.byte()), targs.byte())
    return tpos.sum()/targs.sum()

def precision_torch(preds, targs, thresh=0.5):
    pred_pos = torch.max(preds > thresh, dim=1)[1]
    tpos = torch.mul((targs.byte() == pred_pos.byte()), targs.byte())
    return tpos.sum()/pred_pos.sum()

def log_fbeta_torch(log_preds, targs, beta, thresh=0.5):
    assert beta > 0, 'beta needs to be greater than 0'
    beta2 = beta ** 2
    preds = torch.exp(log_preds)
    rec = recall_torch(preds, targs, thresh)
    prec = precision_torch(preds, targs, thresh)
    return (1 + beta2) * prec * rec / (beta2 * prec + rec)

def log_f1_torch(log_preds, targs, thresh=0.5): return log_fbeta_torch(log_preds, targs, 1, thresh)

The output looks promising (but I could be wrong - so please validate):

m.fit(lr, 5, cycle_len=1, metrics=[accuracy, log_f1_torch])
Epoch
100% 5/5 [00:00<00:00, 16.44it/s]
epoch      trn_loss   val_loss   accuracy   log_f1_torch
    0      0.530434   0.489619   0.733333   0.555556  
    1      0.518038   0.481288   0.766667   0.588235  
    2      0.50331    0.462756   0.788889   0.677966  
    3      0.491052   0.456119   0.766667   0.655738  
    4      0.47819    0.456757   0.788889   0.698413

edit: replaced with a cleaner version - just need to figure out better naming, see: https://github.com/fastai/fastai/issues/658


(Pierre Guillou) #38

Great @stas :slight_smile: Many thanks !

One remark : your log_f1_torch did work in m.fit(lr, 3, metrics=[log_f1_torch]) but not the functions recall_torch and precision_torch.

I made the small following changes in your definitions to make it worked.
Any chance to implement them in the Fastai library ?

def recall_torch(log_preds, targs, thresh=0.5):
    preds = torch.exp(log_preds)
    pred_pos = torch.max(preds > thresh, dim=1)[1]
    tpos = torch.mul((targs.byte() == pred_pos.byte()), targs.byte())
    return tpos.sum()/targs.sum()

def precision_torch(log_preds, targs, thresh=0.5):
    preds = torch.exp(log_preds)
    pred_pos = torch.max(preds > thresh, dim=1)[1]
    tpos = torch.mul((targs.byte() == pred_pos.byte()), targs.byte())
    return tpos.sum()/pred_pos.sum()

def fbeta_torch(log_preds, targs, beta, thresh=0.5):
    assert beta > 0, 'beta needs to be greater than 0'
    beta2 = beta ** 2
    #preds = torch.exp(log_preds)
    rec = recall_torch(log_preds, targs, thresh)
    prec = precision_torch(log_preds, targs, thresh)
    return (1 + beta2) * prec * rec / (beta2 * prec + rec)

def f1_score_torch(log_preds, targs, thresh=0.5): return fbeta_torch(log_preds, targs, 1, thresh)

(Stas Bekman) #39

I was trying to save doing preds = torch.exp(log_preds) twice. But why did you need that change - did you call recall_torch and precision_torch directly? If yes, then, yes, it’s probably the best to use your version. I thought they were just internal helper functions. You suggest that they are used directly.

And, yes, it’ll be in fastai soon. See https://github.com/fastai/fastai/issues/658. I will update this thread when this is done.


(Pierre Guillou) #40

Yes. I need to display them with m.fit as following :
m.fit(lr, 3, metrics=[precision_torch, recall_torch, log_f1_torch])

Great :slight_smile:


(Stas Bekman) #41

It’s in the codebase now: https://github.com/fastai/fastai/pull/661
the previous non-torch functions with the same name were renamed instead to have _np suffix, so use f1, recall and precision.