I found this to be a very good starting point for using the random forest algorithm on the car insurance Kaggle. Can practice tuning the parameters from here.
@jeremy I’m trying to apply different feature selection methods taught in the class on the insurance competition. The metric for evaluation is normalized gini index for this competition. I have few questions around the metric to use for comparing models on validation set:
Can I expect that a RF classifier giving good log-loss/RMSE/accuracy is a good indicator of how well it’s gini index will be so that I can use them as proxy evaluation metric or should I directly use gini index on validation to compare models ?
Are there other recommended evaluation metrics which can penalize the classifier more for false negatives ?
All those metrics should be very correlated with gini. To penalize false negatives, try fbeta (mentioned in this week’s DL lesson for the Planet competition).