I am working on an object detection project that deals with relatively large images i.e. 10000x10000 pixels. They are therefore similar in size to the DOTA dataset.
I have not seen in many places discussions about such use-cases. Some topics in the fast.ai forums speak about it but not in details. Therefore, I am opening this post with the hope to find answers and that it might help other users.
The problem I am facing with such large images is that you can’t load them at original size as memory quickly becomes an issue. Therefore I see two solutions :
- Resizing the images aggressively e.g. 640x640. But I am wondering if this impacts the training ? Indeed, with such an agressive rescale, the human eye can’t distinguish the details anymore. But maybe the computer still can ? One can then reconstruct the bounding box position in the original sized image by using some transform of the architecture output.
- Dividing the image into tiles. But then I don’t know :
- How to manage the cases where the tiling splits a bounding box ?
- How these tiles should be processed in the architecture ? Randomized ? Batchnormed ? …
Any help covering this specific topic would be greatly appreciated !