Object detection in large image

I am working on an object detection project that deals with relatively large images i.e. 10000x10000 pixels. They are therefore similar in size to the DOTA dataset.

I have not seen in many places discussions about such use-cases. Some topics in the fast.ai forums speak about it but not in details. Therefore, I am opening this post with the hope to find answers and that it might help other users.

The problem I am facing with such large images is that you can’t load them at original size as memory quickly becomes an issue. Therefore I see two solutions :

  • Resizing the images aggressively e.g. 640x640. But I am wondering if this impacts the training ? Indeed, with such an agressive rescale, the human eye can’t distinguish the details anymore. But maybe the computer still can ? One can then reconstruct the bounding box position in the original sized image by using some transform of the architecture output.
  • Dividing the image into tiles. But then I don’t know :
    • How to manage the cases where the tiling splits a bounding box ?
    • How these tiles should be processed in the architecture ? Randomized ? Batchnormed ? …

Any help covering this specific topic would be greatly appreciated !