Multiple images per sample

I was wondering what’s the standard approach when dealing with multiple pictures per sample in order to classify it.
Im currently doing this competition: https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries
The thing is it has both tabular and image data, each line in the table has a several images that should be used in order to classify in one of 3 cartegories.

The approach I’ve settled upon so far is just train an image network separately, that would classify images, and train a separate network for the tabular data. Then for each row, duplicate the tabular data and feed, pair with a single image and feed the pairs of X (image, table data) through the CNN input and tabular input layers respectively, generate X predictions and average them.

Im wondering if there are better ways to do that especially with Keras in mind, right now Keras expects a single sample per evaluation, so it’s hard to make all the metrics play nice with this approach and I have to do manual aggregation of the metrics for performance.

Thanks!

How did this approach work?

I’d love to hear more about approaches to make one prediction based on several images.

I have many photos of one product and I’d like to classify that product correctly. My photos of the product don’t always have the same composition. Sometimes they might be from different angles.

It depends on exactly your problem, but if you just have an odd number of pictures per product it’s best to make multiple predictions for each item and then either do a simple average or train another shallow classifier that takes the probability outputs and makes a prediction. The stacked approach makes sense only if you have a reasonable amount of data. Otherwise simpler averaging and thresholds might make sense.