I was reading through the notebook “How random forests really work?” (How random forests really work | Kaggle), when I came across the Gini-measure introduced in Lecture 6. Two questions came up during my read:
- In the notebook it says the following: “What this calculates is the probability that, if you pick two rows from a group, you’ll get the same
Survived
result each time. If the group is all the same, the probability is1.0
, and0.0
if they’re all different.” The formula used is 1 - p^2 - (1-p)^2 (with p being the estimated probability of survival in that certain subgroup). But in my opinion we compute here the inverse probability of what was described in the text (P(Survived and Survived) = p^2 & P(Not Survived and Not Survived) = (1-p)^2; the events are disjoint). Am I wrong? What have I missed? - Is there any connection to the Gini-Coefficent used similarly for measuring how equal/uniform a distribution is? In our case the Gini-measure is 0 is we have a homogenous basket and 0.5 if we have a uniform distribution (p=0.5). In general, a Gini-Coefficent of 1 is assigned to a perfectly homogenous distribution (under a few assumptions). Still it seems to me like they should be connected.