Conceptual Question about Random Forest and Decision Trees

Is Every decision in Decision tree taken at the leaf node

if(yes):
Then why do we have to calculate mean at every node

else:
How do we know at which node to stop at, because if we stop at some non-leaf node then there are actually still some splits let which we can make use of

Can anyone please help me

hi @sai_krishna!

I think you are miss-understanding it.
In a decision tree, it takes a decision at the very last leaf wherever it stops at last. And there are lots of different trees that have different distributions of its features in the splits of the nodes. So, actually, all the trees will be totally different from each other.
And while taking a decision our model actually go through all the trees, it takes decisions for all the trees and then it averages them so that our final decision can be more generalized. It directly means, more the number of different trees, better the model.

1 Like

Thank you for responding,

What I meant was let us consider just one decision tree for now,

Since every decision made goes till the leaf node and and leaf node contains one data point/observation from training set and every decision made is simply the mean of all the samples in that node therefore in a decision tree every prediction correspondes to a actual value in training set(as each of leaf nodes contain one training sample), so can a single decision tree can be interpreted as simply a function which makes prediction by finding the nearest row in training set
and so it can not predict any value other than values present in training set , right ?
In other words for each row in test set it is finding the best(closest , most representative) possible row in training set ? (Sorry a bit long)

Thank you

Hi @sai_krishna!

Yes exactly.
But the nearest raw for each decision tree will be different for every decision tree for a same raw in a test ser, though the training data is same. Because when we split a node there is some randomness. And when we predict, it only corresponds to the parameter by which it had been split on a particular node in a particular tree.
That is why decision of every tree will be different.And that is why we average all the decision. So, the final decision is not going to be compulsorily in the training data.
But yes for a single tree we can consider it as a function you said.

1 Like

Got it,
Thank you @RushabhVasani24

Why would this be true? I don’t think each leaf node would only contain one training sample

as each of leaf nodes contain one training sample)

As splitting of the nodes happen till each node consists of one item (sample ) , unless mentioned an argument called

min_samples_leaf 

Hi @Kobe430am!
By leaf node we mean the last nodes of each decision tree. There can be more than one sample also but we can decided the number of samples in the last node.
I don’t remember the exact parameter but I think it’s mean_sample_leaf.

Yeah, by default it is one sample in leaf node aka last node

Could there be situations where you have not yet reached the min_samples_leaf buy yet be unable to split further?

May be, like for example if there is only on feature left out with which we can work with and split leads to a situation where one node has all the sample and other would be empty, the in that case splitting stops I suppose ( this is a theoretical hypothesis I’m presenting.im not sure )

@Kobe430am and @sai_krishna!

No there is no such situation. Because In the same tree it can split multiple times by the same feature. An every time it splits it will try to make a good split so that a good number of rows be separated out(Not the most number o rows but a good number of rows).
So, as it can split multiple times by a same feature it will definitely go till min_sample_leaf.

Actually I read it in a text book, which says that when a particular feature is used for splitting then the same feature can’t be used in the same path of that tree (it can be used in different path though), so as we send the feature set recursively , we exclude that feature (Again this is not what was taught in fast.ai course ) but I read in a text book describing it as ID3 (one of the algorithms to implement decision trees )and since he asked could there be a situation I thought it was worth mentioning)

Okay !