Kaggle 2019 DS bowl - Tabular model doesn't predict with half the targets

much_learner · December 9, 2019, 6:43pm

Hi there. I am trying my skills in https://www.kaggle.com/c/data-science-bowl-2019 competition.

I am kind of stuck. One thing I noticed is that there’re no 1/2 predictions for the test dataset. I have been adding features but it hasn’t helped to improve a lot.

I see that in the train data there’re twice less 1/2 targets, but I doubt it should be that severe. My intuition is it’s something with the loss function and I want to try label smoothing or mixup next.

https://www.kaggle.com/manyregression/fastai-2019-data-science-bowl?scriptVersionId=24808940

much_learner · December 12, 2019, 12:18pm

I think that’s because it’s a regression problem, not classification.

much_learner · December 12, 2019, 7:32pm

So I improved my score by switching to Regression.

Now I want to have Kappa metric and my idea is to calculate it by passing preds through rounder
https://www.kaggle.com/manyregression/fastai-2019-data-science-bowl?scriptVersionId=25008584#KappaScoreRegression

But I don’t understand the implementation completely. While I am going through it, could anyone please point me on the key thing - where convert a predicted number to a class?

much_learner · January 26, 2020, 5:27pm

The competititon ended recently and I got in 32%, jumping up 1k places.

My solution is very simple https://www.kaggle.com/manyregression/private-fastai-2019-data-science-bowl Most solutions use ensembles of models, but I am not interested in that. I target on practicality and simplicity. I think if it’s in 32% then it works.

The most time I spent on figuring out what works and what doesn’t. Features, parameters, additional data. What helped the most is a good validation set.

At the end I was removing features without which the score stayed the same or better.

Interestingly, if I recall right the winner used transformer on text created from features.