To understand this, first you need to know that a simplest model in ML using “average” as prediction. So if I have 10 rows I can take the average of the rows and use it as prediction.

RF works by trying every column in the dataset. And even within the column it will every possible split. Then it takes the average of the split and calculates the training error. The split with the least error will be taken as the correct split.

Lets take a dataset with only one row - Coupler System. RF will try to split based on each data point. Coupler System > 1 and Coupler System > 2, so and so forth. So for the split Coupler System > 1 it will split the data, which let us assume gives us a dataset made of 15 and 20 rows respectively. Now for all the 15 rows, it will use the average of the 15 rows as prediction and then for the rest of 20 rows it will use average of those rows. Both of these are models in their own right and have their own MSE.

RF will take the weighted average error which is calculated as - (MSE of the 20 row model * 20) + (MSE of the 15 row model * 15)

RF will then repeat the process for next split - Coupler System > 2. Let’s say the split gives us 10 rows and 25 rows. The weighted average error will be taken again. So on and so forth

Out of all the possible splits the one which has the least weighted average error is taken as the correct split.

Jeremy does talk about the weight calculation here: