Image Dataset Diversity

negodfre649 · July 5, 2019, 6:27pm

I have a background in designed and analysis of experiments / statistics.
In designing experiments, there is great care to make sure the variables changed are uncorrelated, independent, etc.

I know with image pixel data there is a great deal of correlation in the data so orthogonality within an image is not possible.

However, are there any strategies to selecting or measuring the diversity of the data achieved to ensure the dataset obtained is not too narrow of a design space?

For example, in unstructured datasets with correlation, I utilize a Hotelling T square statistic to measure the distance of the variables that were changed and compare that to new data that is obtained. If the T square of the data is much larger than the data used in the model, one could be extrapolating.

Could this be applied to images in the model versus a new image that you are trying to predict?

Possibly, if the Hotelling T square of a new image is above a confidence interval you could flag it as extrapolation from the original data set.
Or could that be utilized for selecting images for training?

Essentially, select images that are quite different from one another using very different Hotelling T Square values.

Any thoughts or advice is greatly appreciated.

Thank you,
Nate