xView 2 challenge

Hello and Greetings,
I’m working on the xView2 challenge as this can be a good lesson for me as a person working with remote sensing data. The challenge deals with identifying building damage in remote sensing imagery after natural catastrophes and comprises pre and post disaster images as well as polygons with the post disaster damage labels. More information on the dataset and the challenge is given on https://xview2.org/
At the moment I’m trying to solve the challenge by the training of two different deep learning networks. A first network identifies all houses from the pre disaster imagery by segmentation. This steps works pretty well and is trained with 1024x1024 px pre disaster images and corresponding image masks.
In a second step, buildings are identified and cut out. After this preparation, they are classified with a second deep learning network (classification).
So basically the buildings segmented in the pre disaster images are croped to form small building image chips. Crops from the pre disaster images are labeled no damage and from postdisaster image chips according to the metadata in the corresponding json.
These steps work so far, but I have questions to what is best practice in preparing satellite imagery for training in the classification step. The house chips vary in width and height. 2 sigma of heights and widths lie within 0 to 200px in image size. The mean height and width are 110px.
At the moment I’m using item_tfms=Resize(200,ResizeMethod.Pad,pad_mode=‘zeros’) so that the images are squared with black pixels. Yet one side of the images is always aligned to fill the 200px. With my remote sensing backround, I would say that resampling the image chips to 200px is not so much a good Idea but creating large black areas around the chips for images with small height and width also. In your opionion what could be best practice for preparing image chips of heterogenous width and heigth from satellite imagery. Secondly. To increase variance I’m thinking about using horizontal and vertical flipping for the imagery in the second batch_fms on the gpu together with varying the lighting situation. Are there other useful image preparation steps you would use for satellite imagery? I hope for some input. Many thanks and a happy New Year