Lesson 6 // Bagging and Jelly Beans

Lesson 6 mentions a technique for improving predictions known as bagging.

In a nutshell, by training multiple models on a random subset of the data, and then averaging the models’ predictions, predictions closer to the actual value are obtained.

This is because the errors of each prediction cancel each other out — some predictions are an overestimate, while others are an underestimate.

The errors only cancel out as long as each model is uncorrelated. This is simply achieved by training each model on a random subset of the data, as mentioned above.

As I was reading through this section of the lesson, it reminded me of a short video I watched some years back where people were tasked with guessing the number of jelly beans in a jar.

160 people were asked, and only 4 got anywhere near the actual amount. But when the guesses were averaged, the average was 5 jelly beans off the actual value.

Here’s the video.

Just something interesting :smile: