is there an example of doing object detection on large images, cropping them into smaller sizes and combining results after prediction ? I’m looking for a similar example as this paper describes
“Images in DOTA are so large that they cannot be directly sent to CNN-based detectors. Therefore, wecrop a series of 1024×1024 patches from the original images with a stride set to 512.”
I have not read the paper, but I did a similar thing. I split the image into a 3x3 grid and then did object detection on the 9 patches + the original image (which you can skip). And then combine the predictions in to a single image.
The main repo is here. You only need these process_labels.py and make_splits.py.
Comment if you are not able to follow some code or approach.
thank you very much, i will check it out !
Bit late to the party. I’m also working on the DOTA dataset and have followed a similar approach to slice the images, albeit with different zooms. This has resulted in many of the cropped images having no objects (negative samples).
Just wondering if you also faced this issue? And if so, how did you construct your data loader?
I tested Kushaj’s method and compared against using the whole image. It didnt give a huge advantage in my case so i just reverted to back to scaling down the whole image to a smaller size.
I did have negative samples with no objects, but that was fine, in my case i just ignored those