Porto Seguro random forest starting point

I found this to be a very good starting point for using the random forest algorithm on the car insurance Kaggle. Can practice tuning the parameters from here.

https://www.kaggle.com/sidbecoolyo/random-forest-classification-0-24?scriptVersionId=1574286

1 Like

I’m getting a 404 when I try that link.

#fakenews

yeah thanks I know why, hold on

Ok please try now

@jeremy I’m trying to apply different feature selection methods taught in the class on the insurance competition. The metric for evaluation is normalized gini index for this competition. I have few questions around the metric to use for comparing models on validation set:

  1. Can I expect that a RF classifier giving good log-loss/RMSE/accuracy is a good indicator of how well it’s gini index will be so that I can use them as proxy evaluation metric or should I directly use gini index on validation to compare models ?
  2. Are there other recommended evaluation metrics which can penalize the classifier more for false negatives ?

All those metrics should be very correlated with gini. To penalize false negatives, try fbeta (mentioned in this week’s DL lesson for the Planet competition).

Thanks I’ll try that