ImageDataBunch histogram of class distribution

joerg · June 3, 2019, 7:16am

Hi there,

does the ImageDataBunch class or any other fastai components provide any methods to find out how many samples per class it contains? Sort of a histogram over classes?
Maybe even for separately for the training, validation and test set?

Also, does the data augmentation option via get_transforms() actually increase the number of training samples in the ImageDataBunch or are the original samples merely replaced by their transformed results?

Thanks a lot in advance,
Joerg

dreambeats · June 3, 2019, 9:02am

Whenever we go through a mini-batch, a random transformation is applied on the image after its loaded from disk. So each time we go through minibatches, you can expect a different transformation to be applied to the same image.

To look at the distribution of classes, right out of the top of my head I suppose you can grab the classes for each image from data.train_ds.y and plot the relevant histogram. You can repeat the above for valid_ds and test_ds.

joerg · June 4, 2019, 8:01am

Thanks a lot for your answer James. That really helped me.