Porto Seguro random forest starting point

daschumacher · November 1, 2017, 2:31am

I found this to be a very good starting point for using the random forest algorithm on the car insurance Kaggle. Can practice tuning the parameters from here.

https://www.kaggle.com/sidbecoolyo/random-forest-classification-0-24?scriptVersionId=1574286

fryanpan · November 1, 2017, 2:31am

I’m getting a 404 when I try that link.

ekim22 · November 1, 2017, 2:32am

#fakenews

daschumacher · November 1, 2017, 2:32am

yeah thanks I know why, hold on

daschumacher · November 1, 2017, 2:34am

Ok please try now

shik1470 · November 15, 2017, 10:01pm

@jeremy I’m trying to apply different feature selection methods taught in the class on the insurance competition. The metric for evaluation is normalized gini index for this competition. I have few questions around the metric to use for comparing models on validation set:

Can I expect that a RF classifier giving good log-loss/RMSE/accuracy is a good indicator of how well it’s gini index will be so that I can use them as proxy evaluation metric or should I directly use gini index on validation to compare models ?
Are there other recommended evaluation metrics which can penalize the classifier more for false negatives ?

jeremy · November 15, 2017, 11:10pm

All those metrics should be very correlated with gini. To penalize false negatives, try fbeta (mentioned in this week’s DL lesson for the Planet competition).

shik1470 · November 15, 2017, 11:26pm

Thanks I’ll try that