Increase Image Size Intuition (Lesson 2)

What is the intuition behind using larger image sizes as you progress in training, as mentioned in 01:32:45?

I could see how this would allow greater generalization for images that might be of varying (in this case, bigger) sizes, or images that are zoomed in (and thus some objects appear bigger), but is it expected to increase the performance of predictions for “normal” images that don’t fit this criteria too?

Is this effectively just another method of data augmentation? Would starting with the larger sizes, but running additional training epochs with data augmentation (tfms) zoom set higher (>1.1) effectively be doing something similar, or is there an additional advantage in running these epochs later in the training workflow?

Wanted to put my question up top, but also wanted to say how grateful I am for this resource and the classes here. I haven’t coded in about a decade, but my interest in DL/AI led me here and I’m re-learning the basics so I can be a part of this community. Thank you so much, Jeremy and Rachel!

As a follow-on, if this is essentially data augmentation then why only increase the image sizes? Why not also decrease image sizes to allow for even greater generalization?

How does increasing an image size work? If an image is 200x200, making it 400x400 will just make each pixel larger (I’m assuming). How does this help our training algorithm? This doesn’t make sense to me. The image is still the same, but it’s stretched out.

Another question, what actually happens to the images when we set an image size for the training model? Does each image get physically smaller or larger in the temp folder and then ran through the training model?

Here’s an article that addresses this question:

It sounds like resizing helps the network learn scale-dependent patterns.

I have messed around with this, 64x64 to 128x128 to 256x256 to 512x512 and have found better results than just training at 224 the entire time!