Wide and Deep Models into fastai

As I was watching the videos for the part 1 2019, in particular lesson 4, someone asked around how to include metadata when using the collaborative filtering methods within fastai. I actually believe there was another question around how to combine images, text, and tabular data. @jeremy replied precisely how one would do it (through a combination of RNNs, CNNs, and dense layers) and mentioned that it was something still not included in the library.

I simply wanted to point out that this fits within what is known as Wide and Deep models, used for recommendation system initially (here is the nice paper by the Google guys: https://arxiv.org/pdf/1606.07792.pdf) but in reality can be used for almost any problem where you have data of different nature (tabular, both categorical and numerical, images and text)

A while ago I wrote this: https://github.com/jrzaurin/Wide-and-Deep-PyTorch

Which I updated recently. It is written in Pytorch, but not in (nice) fastai style, and I am wondering if it worths re-factoring it to fastai style an include it in the library (?), so there would be a fastai.wide_deep method or similar. Is not that sophisticated really, so maybe @jeremy and his team can do it in no time :slight_smile: . I am more than happy to help/contribute if necessary.

Cheers

4 Likes

Great work! I have a big dataset consisting of images with additional meta information, where the meta info is important for correct classification. This is exactly the kind of thing I‘m currently missing in the fastai lib.

It’s certainly something we’d like to provide better support for in fastai v2. Perhaps fastai.tabular could support columns that contain image filenames, or text to be tokenized, etc…

2 Likes

That would be ideal (based on my experience).

If you work for example in a booking/letting/renting company chances are that you have tables with metadata (price, postcode), descriptions and possibly urls to images or image_ids. Same applies to fashion and a lot of e-commerce these days. This is why I used the Airbnb dataset in my repo, because I think it represents a very realistic approach to the datasets that most of us faced normally (tabular with text and image ids stores in disk).

Regarding the algorithm, I think you guys are in an excellent position to build something great. In my repo I combined images with text. For the images I use resnet34 but for the text I used fasttext vectors. In other words, images and text don’t play “in the same league”. For a fair, equal contribution from text and images one would like a pre-trained language model, and you guys have it along with the architecture (your fantastic ULMFiT).

Finally, the most interesting aspect of the model is how you combined all branches. In the Wide and Deep paper/model (and in my repo) one plugs all the models (wide, deep, image and text) to an output neuron. Another possibility is a series of dense layers prior to the output, which in my experience does not help much (if anything sometimes adds confusion). But I am sure with all the tricks you guys use and have implemented (learning rate finder, Cyclic LR, gradual unfreezing) you will find the right process and produce a highly usable algorithm :slight_smile: .

Nonetheless I will keep an eye on this topic and see if I can contribute.

Cheers

@aklbg thanks for your comment :slight_smile:

if you think the code in my repo can help, please, give it a go and if you find a bug or have some suggestion, let me know.

Once they include the utility in fastai I am sure it will be more powerful, but in the meantime, maybe the code there is useful :slight_smile: