I saw something about a new SOTA class of models for image classification, object detection, and segmentation: FocalNets from Microsoft, it looks very interesting. I had been thinking about how our own vision system has very limited resolution outside the 3 degrees of central vision, and that rather than dealing with large high-resolution images it might be more efficient to recognize possible interesting objects and movement at a very low resolution, like our peripheral vision does, then focus in to get a better look at them. I haven’t digested the paper yet, but I’m guessing this does something along those lines.