Input size for fully convolutional networks

I’m trying to reproduce the Deep Matting paper which follows a network similar to the Tiramisu - a fully convolutional network with an encoder and decoder paths.

My intended input images should be mobile phone images. Since this is a matting problem (i.e: finding the edges of the objects), I’m wondering what effect does the input size has. Should I prefer to train on the biggest input size fitting into the memory (e.g: (1200, 1200)) with small batch_size or should I prefer bigger batch_size with smaller chunks?

Since this is not a segmentation problem, I’m guessing the network should have less interest in the general shape of the object, but rather with the edges (and trimap) and so presenting only parts of the images should still work for training.

Similarly, I would think that a similar question can be asked if trying to train a segmentation problem on a dataset like satellite images.