Automatic image transformations


Sorry if this is obvious, but even after searching I found no definite answer on how this is handled exactly.

Does cnn_learner() automatically apply image transformations before classifying an image, i.e., before feeding it to the input layer? Does it do any automatic image transformations before using an image for training? Or does ImageDataBunch do any automatic transformations?

This assumes I use a simple code like this:

data = ImageDataBunch.from_folder(path, bs=16)
learn = cnn_learner(data, models.resnet18, pretrained=True, metrics=accuracy)

The image, that can be output as follows, appears to be unmodified in size and other parameters, but this simple test of course does not mean they get fed in unmodified into the neural net.

img,label = data.train_ds[0]

Training would be invoked like such (or with similar parameters):

learn.fit_one_cycle(20, 0.01)

And inferring like this:


The documentation remains a little vague regarding the above questions, or maybe I missed where it’s explained. Could someone point out a definite answer to these questions?

Judging from the documentation at all pretrained models expect as training input:

3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.

I wonder how the example here can work, given that the images from MNIST are 1x28x28 (8 bit PNGs in the folder). There must be some automatic conversion, but I can’t find out anything about what it is in the documentation of fastai.

The relevant code for loading the data is:

data = ImageDataBunch.from_folder(path, ds_tfms=(rand_pad(2, 28), []), bs=64)

That means that the transformation rand_pad(2, 28) is applied to the training set, and no transformation is applied to the validation set. As far as I can tell from the code, normalization does not affect image size at all. Since the validation set is directly obtained from the 1x28x28 PNG images, somewhere a scaling has to be done as per the quoted requirement of the documentation, from:

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.

The PNG images are too small to fit the required size of at least 224x224 and being RGB instead of single channel grayscale. Where does this transformation happen? I couldn’t find it.

You shouldn’t believe the PyTorch documentation :wink:

I edited my post above your reply with more detail, while investigating the code. Could you maybe comment on its accuracy?

Does this mean PyTorch can accept any size image? More details would be appreciated :slight_smile:

Not sure about the PyTorch’s models, but the fastai ones can accept any sizes, with a minimum that depends on the model (sometimes 16, sometimes 32).

Are the images somehow adapted before they are fed into fastai’s models, or is it because of the models’ architecture?

@maelh: If I understood you correctly the answer is that fastai adapts the models - see section “Transfer learning” in the fastai documentation.