Are there any articles about the best strategy to make images square? Padding seems intuitively the best strategy because there is not data loss. But I think in the first lesson it was suggested that the images should be cropped.
If your model takes 224x224 images, then resize the image so that the smallest side is 224 (or slightly larger, 256) and then take a random 224x224 crop. The advantage is that this approach acts as a kind of data augmentation. There isn’t really any data loss since if you train for enough epochs, the model will still see your entire image.
Thanks. For completeness: When validation/predicting, could it be a good idea to use padding to make sure you don’t remove any important features that lie at the edges of the image?
It’s probably a good idea to use the same strategy in training as in testing. If you don’t pad during training, but you do during testing, then suddenly the neural net sees black bars around the image that it has never seen before. This may cause issues – or not, since the network does use a bit of padding for its “same” convolutions anyway.
The best results are obtained by sampling many random crops from the test image, making a prediction for each of them, and then averaging the predictions across these crops. But this typically isn’t feasible for real-world inference since it takes too long.
I would like to go a step back and ask,
What are some advantages of using squared images? Is there a way to use images in non-squared form?
I help companies build machine learning models into their iOS apps and it’s not unusual to use a model that accepts images with a 6:19 aspect ratio (for 720x1280 video in portrait mode).
If the model is fully convolutional, then it can accept images of any size, so use whatever makes sense for your application.