I was wondering what’s the standard approach when dealing with multiple pictures per sample in order to classify it.
Im currently doing this competition: https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries
The thing is it has both tabular and image data, each line in the table has a several images that should be used in order to classify in one of 3 cartegories.
The approach I’ve settled upon so far is just train an image network separately, that would classify images, and train a separate network for the tabular data. Then for each row, duplicate the tabular data and feed, pair with a single image and feed the pairs of X (image, table data) through the CNN input and tabular input layers respectively, generate X predictions and average them.
Im wondering if there are better ways to do that especially with Keras in mind, right now Keras expects a single sample per evaluation, so it’s hard to make all the metrics play nice with this approach and I have to do manual aggregation of the metrics for performance.
Thanks!