Do I model my training set on real world classification percentages?

Latency · August 5, 2017, 6:33pm

I’m trying to classify images into 1 of 3 distinct categories:

normal
blurry
highly sharpened

In a real world scenario there will be many more ‘normal’ images than ‘blurry’ images. And there will be more ‘blurry’ images than ‘highly sharpened’ images. Part of the problem I’m trying to solve is to figure out these percentages as they occur in real world data but I should populate each data set (training, validation, test) to model the real world, correct?

Let’s assume that 85% of the images will be ‘normal’, 10% will be ‘blurry’ and 5% will be ‘highly sharpened’. Does that mean my training set should also contain this same breakdown of examples? I don’t think it makes sense to just fill the training set with ~33% of each class.

Any clarifications/advice would be much appreciated.