I’m wondering if anyone has some advice for how to approach problems where we have multiple images per label? Concretely, I have multiple images for a particular object at different depths and perspectives and the the object belongs to a particular class which needs to be classified.
The naive thing to do is to replicate the label for each image and treat them independently in the model, but I think it would be better to have the model learn the correct label for all the images belonging to the same object collectively. But, I don’t know the best way to do that. Any suggestions?
One thought I have is to just concatenate all the images belonging to the same object along either the row or column dimension. So, if I have 5 images belonging to the same object and they’re all 3x250x250, the resulting concatenated tensor would be 3x1250x250 if I concatenated across the row dimension or 3x250x1250 across the column dimension. The issues with this technique are that my tensors will be really large and so the batch-size will need to really low and, more problematically, I have a different number of images for each object. One object might have 5 images while a second object might have 18 images, while a third might have 2 images. Of course, I could pick some baseline number,
n, that defines how many random images will be selected and concatenated from each object in creating a batch of tensors.
Also, maybe my google skills are subpar, but I couldn’t easily find literature on this topic in the academic canon or otherwise. So, if you are aware of any papers that deal with this topic, I would be greatly obliged if you passed them along.