Unbalanced classes and overfitting?

pgowder · February 8, 2018, 12:07am

In lesson 2, at around 1:40 or a little before, Jeremy responds to a student question about unbalanced classes by suggesting that duplicating the instances of the rare classes can be a good strategy.

My question is: why doesn’t that tend to promote overfitting? Intuitively, if you only have a handful of instances of a rare class, and instead of getting more instances you just repeat the instances you have, then any idiosyncrasies in the original dataset will be magnified. Right?

Thanks!

tensoralex · February 12, 2018, 12:31am

Since most of the training is done in mini batches - up-sampling gives a chance to have at least one instance in each mini batch. if you up-sample some idiosyncrasies - yes their influence will be magnified, but that’s sometimes what you would want to. I don’t think it directly promotes neither overfitting or underfitting.