I wanted to explore merging image data with tabular and text data using transfer learning for the image and text part and trying to apply everything I learned in the course. @sgugger was kind enough to implement a MixedItemList to be able to “easily” do that.
I used the recently finished PetFinder Kaggle competition since it had several images per pet, tabular information for each pet and also text description. I provide the code for that here:
This is my first Kaggle competition and also my first custom model… So even though I was having 0.42323 quadratic kappa score on my own rudimentary validation set (no fancy cross-validation), I did not do super well on the private leaderboard with 0.25780.
I am looking for comments and inputs on how to make the model better… I had to write custom methods to adapt to the tensor shape produced by MixedItemList, like custom normalization for the images, custom collate function, custom split_layers function. Not sure those are optimal. I would be really interested in having inputs on how to make it better, it could be a good tutorial for people wanting to leverage MixedItemList.