I’ve had a few questions of mine pop up after going through lesson 6 of the course.
I’d appreciate any answers to them, and feel free to answer only one question or a few!
Decision Trees:
- When using decision trees, should one use gini or the metric provided in the competition to judge the performance of the tree?
- Should one typically limit the number of leaf nodes or the number of samples per leaf?
- What’s typically a good amount of samples to have per leaf node in a decision tree?
- When should one use OOBE? Or rather, when should one use OOBE and validation error?
- Can you use decision trees for NLP tasks? If so, I suppose one would have to create their own features? Or are the tokens themselves enough (e.g., does a document contain the term “delicious”)?
Questions below have been answered. Feel free to add more to them!
Data:
-
When viewing the values for feature importances, what is a good cut-off point (e.g., any feature that has an importance less than 0.05 will not be used)?
-
Even if a dataset is not time series, should one not still take a random subset?
Ensembling:
- When ensembling a random forest and a neural network together, is a single neural network enough, or should you add more?