Multimodal multitask text and image model


Been lurking here for the past year but this is my first post. I have a dataset of home repairs with ~30k records with columns for image urls, inspector comments describing the damage, labor category, and repair cost.

I want to use the image and inspector comments to predict labor category and repair cost.

I was able to get the image classifier to work okay to predict just the labor category, so I have the images saved on my machine by labor category already. I want to see if I can incorporate text and also predict the cost though.

Any ideas or links to similar projects/tutorial would be much appreciated.


1 Like

You may find this multi-target example from Jeremy helpful to work out how to have different / multiple targets in one model.

And further discussion in the live coding session Live coding 13

1 Like

thanks allen!