Best model if you have numerical, categorical, Text and Image features to predict same target variable

Fairly new to the ML and DL universe. Was wondering what would be the best approach to create a model that should a predict a target variable based on features that have numerical, categorical, Text and Image elements.

One example for such a problem would be the latest Kaggle Competition.

Any DL model should work fine for that. We cover multi-modal learning in so maybe start there?

Thanks a lot. I believe that is covered in the Part 2 of Cutting Edge DL Course?. Currently I am about to finish Part 1. Will Check it out for sure. Thanks again.

Yes part 2 would be a good idea for learning more about this. Feel free to ask in that forum if you finish that course and still aren’t sure.