Can adding more training data cause training loss to overfit?

source99 · September 6, 2018, 1:31am

I had some good results on my existing training data and I extracted some additional training data.

When I added the additional training data it caused me to overfit.

I am basically using the dog/cat notebook from lesson 1.

Why would more training data cause overfitting?

devforfu · September 6, 2018, 4:00am

I don’t think that adding more data can cause overfitting. Because the problem of overfitting comes from weak generalization capabilities of your model, when low training loss doesn’t reflect low test loss anymore, and the model performs bad on the unseen data.

I think you need to analyze your training/learning curves, and verify the process of splitting your data into training and validation subsets to make sure that you don’t do any data snooping.