I am creating this topic to collect papers(links) that were mentioned during ML1 course. For example if anyone finds the paper about bagging please add through replies. Thanks !
Just leaving this here as an external source of info
It’s not a link suggested during class, but I guess we could use this thread for sharing general useful links.
-> So it says that sklearn random forest does not give good results with categorical variables. It is written that categorical variables have to be one hot encoded which I think is not 100% right. Like we did in class, can’t we just treat them as continuous discrete (we can also order them if needed) ?
The article has shown a huge difference in scores when our dependent variable is dependent on some categorical independent variable (as compare to H2o implementation). H2o tree implementation outperforms sklearn’s.
Is this true? Any comments?
Nice article on ensembling and why it works
This paper was not mentioned in class, but I think it is very helpful when it comes to understanding how a single decision tree, i.e., a building block a random forest, works.
Since this paper is related to DNA and proteins. Here is some biology background you may need when reading this paper:
DNA-double stranded ‘ATCG’ (codes) that encode your proteins
Protein coding region-region of DNA that leads to proteins
Non-coding region-region of DNA that does not lead to proteins
Hope you all find this helpful
decision_tree_gene.pdf (1.6 MB)