I am trying to learn how to classify satellite imagery in the form of tiles/patches (e.g. up to 64x64 pixels). I have trained a preliminary model and can use it to predict image patches that I manually created. Do you have any suggestion on the best way to split/slice/crop a large image into tiles, for inferencing (in batches?). I tried to look at Torch Vision’s “transforms”, but did not manage to find a suitable thing/function/library to use…
So I’m running into a similar problem for an image segmentation task where the images are really large, way too large to fit into GPU memory. Could you point me to some example code where an image is split into patches and then repatched together to reproduce the original?
I’ve tried using unfold but I’m not really familiar with PyTorch tensors so I’m struggling to get it working properly.
Open the whole image with gdal.open() and stack the individual bands to a numpy array.
Then extract the image extents with gdal and use them to calculate the sliding window extents(pixel coordinates) you´ll need according to your wanted window size
Then use numpy array selection to get the individual windows and save them to a numpy list … done!
A prerequisite for this method is, that your available RAM is a few times larger than your image (although this can be optimized of course).
I managed to get it working using torch.Tensor.unfold (https://pytorch.org/docs/stable/tensors.html#torch.Tensor.unfold) in the end which seems much simpler: patches = img.data.unfold(0, 3, 3).unfold(1, 224, 224).unfold(3, 2224, 224) - This takes the image and gives a set of 224 * 224 patches. The only thing I am stuck with now is how to use the patches to reconstruct the image.