Can we train using different sizes of arrays as the x?

genkiro · February 13, 2017, 5:20am

I am playing around to solve an old Yelp Kaggle competition. Task summary: given so many pics for each business (restaurant), multi-label each business (good_for_dinner, outdoor_seating, etc). I was thinking that the approach should be:

Build an array filled with the VGG predictions for each pic for the business. This would be the x for the .fit() process.
Train it with the given training labels (y)

Now, a consideration for step number 1, every business has so many pics. Some businesses only have a few, some others have hundreds of pics. I can either

Compact the VGG results of all the images for that one business into just one thousand array that represents them (using np.mean() or np.max())
Take the whole concatenation of the all VGG results and use it as x

I’m guessing the latter approach is better since it’s not losing information due to compaction. But I’m not sure if we can use different sizes arrays as the x for the training process of CNN + Dense combos. Is there such a way to do that?

Hopefully I’m not missing any relevant thing mentioned in the lectures.

I thought of RNN too, but it sounds like it is for sequenced inputs where the order matters (not for Yelp’s case). I’m not so sure if it is the proper thing to use here.

radek · February 13, 2017, 10:48pm

Hi Ronny - don’t really have a good answer myself, interesting question though.

One other architecture you might want to consider is just having X be single images and y be the labels for that business (if a business has 50 pictures, there will be 50 training examples and the labels for that business would be duplicated across all the images for that business). Then at test time, you could just take an average of what you get for pictures for a given business… or maybe there is some other better way to combine the predictions.

Anyhow, I guess my approach is quite naive though might be interesting to try it out.

genkiro · February 16, 2017, 4:42am

Hm, I didn’t think of that approach before. I have some doubts but I’ll give it a try next. Thank you @radek!

thiago · November 4, 2017, 10:52am

Hi @genkiro!

Have you tried the @radek suggestion?

I’d like to play with this competition, but it seems quite challenging.

Cheers,