You can now use MixedItemList (graciously coded by @sgugger) to merge different types of ItemList together.
Look at this thread where I try to tackle this exact problem: Custom ItemList, getting ForkingPickler broken pipe
I successfully trained a model with Image + Tabular. Still trying to integrate text now.
You can also see my model at the bottom of this post and how I loaded the data.
I am currently working on integrating text data. I successfully implemented the data collate function for the variable length text input in the batches. The image, tabular AND text data is correctly passed to the forward function of my Module. But I am running into problems for loading a pre-trained text-encoder I can use in my custom Module… Here is the thread about that if you are interested: Pre-trained text encoder.