I’m trying to use fastai.tabular to predict malware detections in Kaggle’s Microsoft Malware Detection competition. My model scores poorly (0.657 AUC, which is ranked 1768/2282 and is 0.053 behind the #1 score). The kaggle forums say that even the most basic blind models are scoring around 0.67, but of course these are not fast.ai users. I think they’re using more standard ML techniques/packages. Does anyone have an opinion on whether the fastai.tabular deep learning model simply doesn’t perform as well for this particular case? If I recall correctly, Jeremy in his course said that he has found that 90% of the time the fastai.tabular library is as good as more conventional structured data ML, so I wonder if this is just one of those 10% that won’t work.
I realize I haven’t given a detailed description of how I’m handling the different categorical/continuous variables, or how I’m splitting validation. I can get into this if anyone is interested.