Best way to resize pictures for model training

What is the best way to resize pictures for the ConvLearner model? In the old version, there was a way to resize and it would create a new folder with the resized images, but I haven’t been able to figure out if that exists in v1. The reason I ask is because currently, my model I’m trying to train for classification takes a long time to run and I think it is because it is resizing the images every time instead of resizing them once at the beginning and using the small resized images for the rest of the time.


I think you can use imagemagick on the downloaded data. This is a script I use:


rm -rf $DEST
mkdir -p $DEST

cd $SRC
find . -maxdepth 1 -mindepth 1 -type d -exec convert {}/*.jpg -resize 224x224^ ../$DEST/{}.jpg \; 

This is for a set of dataset where images groups into a set of categories.
Customize the script as needed.


If you want to do a square crop use this imagemagick script:

find . -maxdepth 1 -mindepth 1 -type d -exec convert {}/*.jpg -resize 224x224^ -gravity center -extent 224x224 ../$DEST/{}.jpg \;
1 Like

Heres another hack:

#Image is overridden by fastai
import PIL.Image as pil_image
size = 900, 600

for infile in file_names:
    outfile = RESIZED_PATH/infile
        im =
        im.thumbnail(size, pil_image.ANTIALIAS), "JPEG")
    except IOError:
        print (f"Error generating {infile}")

verify_images has a param to let you resize now :slight_smile:


What if the image is not square? For example, I have an image of 10 seconds spectrogram (with some manatee calls) and they all have this dimension (image attached)

. Since it is time versus frequency I cannot just make it square. What would be the solution in such case?

Also, what do you guys think about the argument that CNN does not work well for spectrograms, i.e. " two-dimensional representations of audio frequency spectra over time" ?

As long as all your samples are the same length in time, you could just stretch the vertical dimension to make it square. One thing to be wary of is that you want to make sure that the network input resolution isn’t so small that you end up discarding a ton of information in the spectrogram which would degrade performance. Might be worth considering performing some type of dimensionality reduction in that case since there is typically a lot of dead space in spectrograms.

1 Like

The problem is, I cannot stretch it vertically, because it will distort the call and its properties, i.e. frequency data. Also, spectrograms vary a lot, some are quite “busy” with much less “dead space”. I have seen in another thread someone is working with square spectrograms, where sound types are isolated and all represented in square image form. In my case, I am trying to take a raw 10 second file spectrogram and do the identification of what it contains.

Sorry for a stupid question, but why do images have to be square?

It shouldn’t matter if you stretch vertically as long as all samples have the same number of frequency bins in the Y axis. The CNN isn’t really going to care that it’s stretched since the relative distribution of energy in the frequency domain should still be the same across samples.

The article you linked makes very good points and I would definitely agree that CNNs aren’t necessarily very well suited for spectrogram analysis though they might still work well enough if you are doing simpler classifications where each class has a distinct “fingerprint” in the spectrogram

verify_image keeps the w/h ratio of the original image.

see the function.


verify_image(img, delete=False, max_size=320, dest=DEST_PATH)


Thank you! This would be more desirable, since I do not want to mess with my raw spectrograms. But they all have to be the same size (i.e all rectangular or all square), correct?

1 Like

That’s not a stupid question at all, and honestly, I’m not entirely sure why it is still the de-facto in modern architectures. I suspect there are a few historical reasons for this:

  1. Square dimensions makes designing networks a lot easier. In the early days, a lot of thought and effort had to be put in choosing dimensions correctly so the stride/padding of all filters in the network worked themselves out and resulted in a desirable output featuremap size. Heck, you still have to do that now, but things are a bit easier now due to best practices and learnings in the past couple of years. Keeping images square meant that you only had to worry about getting these calculations right once.

  2. I believe that some of the CUDA/CuDNN stuff could have been originally optimized to only handle square images, but I could be wrong on this one.

  3. Most benchmark image datasets are close-cropped and square/squarish. MNIST is square, ImageNet isn’t square but pretty much all the images have the item of interest in the central region of the image. So it didn’t tend to hurt these models too much to simply use square input image crops (though I believe some of the improvements Jeremy has achieved in a couple of the state-of-the-art Imagenet results involves improving on image cropping approaches with Resnet models)

  4. It used to be that older CNN architectures had a few fully connected layers at the head of the network which meant that you needed exactly the correct input image dimensions so that the number of features in the final layer that connected to the FC layers matched. Since you had to force the input image size exactly anyways, specifying that they be square seemed reasonable enough. Now, with fully convolutional networks, you actually don’t have this limitation and in some architectures (like YOLO), you can essentially input higher resolution images and the output of the network is essentially equivalent to having run YOLO at it’s default resolution on a bunch of translated and cropped views of the higher-resolution input image.

I’d be curious if there are any more obvious reasons for why square images are still commonly used in the latest gen models.


Thank you for your reply, this is very interesting. It makes sense , but there are some instances where the central region of image is not the only area of interest. I wonder, for all these self-driving cars they would want to have much broader “visual field”. Also, the non-traditional images, like satellite imagery, spectrograms, etc would have features of interest beyond the center of the image.

You want a square image for input into your CNN, as the CNN architectures typically require squares (there may be some bespoke non square architectures) - one reason is that with squares you dont have to deal with widely varying aspect ratios and moving the kernel across the image in a regular way is easier to do.

You can add padding to top and base of your image if you dont want to change the aspect ratio, but stretching may be a better option - try both and see which gives a better result.

Something I tend to do with 4:3 aspect ratio images is simply squish them to 3:3 images and train the CNN on that. You lose very little visual information and that way you at least don’t risk missing an important part of the image by cropping. It gets a bit more dicey with 16:9 images due to the larger amount of loss of information you would have in one dimension.

Note - This only works if you apply this same horizontal compression transform to all your images, otherwise you would be training your model with images with very differing aspect ratios for the classes of interest.

now, I am confused again, if “verify_image keeps the w/h ratio of the original image” why do I still need to add padding (which essentially a meaningless space ) to make my spectrogram square?

check out and

Ultimately we want a square image. What it looks like to me (just looking through the code now) is that if you create an ImageDataBunch with a directory of images and these are non-square and you dont specify a resize_method for then the images are cropped.

There may be more pre-processing steps you can do in fastai_v1 (still very new to me, and no-doubt there probably are), but what you can do for now is pre-process the images to be square using the method you want, then feed the square images into ImageDataBunch.

ok, but I still do not understand why they have to be square apart from the historical influences of always using square images for CNN. has some good background on CNN’s. Also has a good overview.

For the convolutional step, as @HamsterHuey mentions things are easier to calculate with squares.

1 Like

Thank you for these great resources! In your second resource Figure 6 image is not square, and it also cites a paper that has images of a wide variety of shapes for object detection, so CNN can handle all these shapes or am I not understanding it correctly again?