I am playing around to solve an old Yelp Kaggle competition. Task summary: given so many pics for each business (restaurant), multi-label each business (good_for_dinner
, outdoor_seating
, etc). I was thinking that the approach should be:
- Build an array filled with the VGG predictions for each pic for the business. This would be the
x
for the.fit()
process. - Train it with the given training labels (
y
)
Now, a consideration for step number 1, every business has so many pics. Some businesses only have a few, some others have hundreds of pics. I can either
- Compact the VGG results of all the images for that one business into just one thousand array that represents them (using
np.mean()
ornp.max()
) - Take the whole concatenation of the all VGG results and use it as
x
I’m guessing the latter approach is better since it’s not losing information due to compaction. But I’m not sure if we can use different sizes arrays as the x
for the training process of CNN + Dense combos. Is there such a way to do that?
Hopefully I’m not missing any relevant thing mentioned in the lectures.
I thought of RNN too, but it sounds like it is for sequenced inputs where the order matters (not for Yelp’s case). I’m not so sure if it is the proper thing to use here.