Baidu has answered this question empirically, but I don’t have a good background in math so I don’t understand the answer:

Many studies theoretically predict that generalization error “learning curves” take a power-law form, ε(m) ∝ αm

^{βg}. Here, ε is generalization error, m is the number of samples in the training set, α is a constant property of the problem, and β_{g}= −0.5 or −1 is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples^{1}. Unfortunately, in real applications, we find empirically that βg usually settles between −0.07 and −0.35, exponents that are unexplained by prior theoretical work.

Here’s the same text as a screenshot in case any of the math notation doesn’t display properly:

For example, for image classification on ImageNet:

The top-1 classification error exponent is β

_{g}= −0.309. On the other hand, the exponent for top-5 classification error is β_{g}= −0.488.

How can this be expressed for a non-mathy layperson? For example, how much improvement in accuracy results from a 10x increase in training data?