It’s unclear to me how fastai changes the size of it’s model to accept different sized input images. It just seems like changing a model to accept different sizes shouldn’t work on a few levels.
- The semantic meaning of activation changes at different resolutions. Let’s say some filter of your first convolution finds a vertical edge. In a high resolution image a bikes spokes might activate it very strongly, whereas in a low resolution image they wont activate it at all.
- The depth required to identify a shape changes depending on the number of pixels the object takes up. If your network learns to identify a square in the 4 layer at a low resolution, the receptive field will be too small to see the square later on
- What happens if you have enough pooling layers that your image goes to one pixel in the middle of the model?
How does changing the size get around these issues in practice? Is there a paper that fastai’s implementation is based on?