I know tabular learning isn’t necessarily the focus of the course (which is excellent, btw), but given its practical importance I figured I’d ask anyways. Apologies if this is a little more suited for the ML course (which is also excellent for those of you on the fence about taking it!!).
If, for example, continuous feature D is a linear combination of continuous features A, B, and C, is it ok to include all four of the columns? Or is there a redundancy and should I omit column D.
When (if ever…) is it appropriate to break up the problem into clusters/groups and have separate models for each. Take the Rossmann problem for example, is it advisable to run separate models based on store groups binned by average store sales? or geography? or perhaps a model for each store individually? Is the reason we didn’t do this because of compute limitations and combining all the data/training a single model in theory should pick up on these features/gets us most of the way there in way less time OR is the single model approach actually better than splitting the problem up into several models based on groups of stores?
In general, I feel I don’t have a ton of intuition yet on this particular application of deep learning. If anyone has any advice, or better yet, concrete notebook examples (that they would be willing to share, of course!) regarding approaches to structured data prediction problems, that would be awesome. Right now, I’m able to use more or less the same fast.ai Rossmann techniques, but I’d be super interested in seeing any further modeling (or analysis) extensions to similar problems people are employing to squeeze out better results
Thanks, any insights would be much appreciated!