I am a little bit confused about finding out of domain data in chapter 9 of the book by implementing random forest to predict whether the row in the train or validation data to check the distribution of each dataset(train or validation). Can anyone elaborate on the relationship between higher feature importance showing a higher difference in training and validation set?
Anyone can help me?
I really appreciate it.
Thank you.