How to train on dataset of large images

Hi! I have a similar question regarding this topic, where rescaling the image to 224x224 is not enough for the CNN to pick up small details. From the discussion in this thread, it sounds like I should cut the images into smaller tiles and then train using those tiles.

My question is, how would prediction happen? For a given test image, if it has been split into, say, 4 tiles, but the predicted labels for the tiles are all different, how do you decide which is the correct overall prediction for that original image?