Asking - as I think I’ve mis-understood something; so likely a few inaccuracies in my below description.
A common image for explaining a CNN filter shows it being applied over a sub grid in an image such as
And convolutions are defined with a width and height.
However; since images are flattered before they are processed; and fully-convolutional networks can work on any image size & aspect ratio, and we don’t tell models the original width of an image - how would a 2x2 filter be applied to 2 pixels in the first row and 2 pixels from the second row of the original image?
For example; if the input image is 240x480; then the first grid of pixels would be 0,1,240,241 in the flat array, whilst if an image was 640x480 the offsets would be 0,1,640,641.