Adding more data leads to worse training

j.laute · May 15, 2019, 2:42pm

Hi I have a strange problem with my model. When I am training it with a subset of the training set for example 10% (use_partial_data in the datablocks api) it seems to learn really well and the loss goes down a lot (training and validiation loss, though it is overfitting a bit)

But now if I use the full data set, the loss doesn’t go down and gets stuck over the whole training process. Might this be a case of the model underfitting the whole dataset?

I will update with some more model specifics when I am back home.

Thanks for any help or ideas!