Selecting a Tabular Multi-label Loss function

mindtrinket · September 23, 2022, 12:02am

I need help with the multi-class logarithmic loss for this dated Kaggle competition. Microsoft Malware Classification Challenge (BIG 2015) | Kaggle

𝑙𝑜𝑔𝑙𝑜𝑠𝑠=−1𝑁∑𝑖=1𝑁∑𝑗=1𝑀𝑦𝑖𝑗log(𝑝𝑖𝑗),

I have a tabular table set up using BCEWithLogitsLossFlat but struggle to understand if it is the best fit within these options fastai - Loss Functions

TL;DR- I am still horrible at loss functions and choosing the best one.

BobMcDear · September 23, 2022, 11:58am

Hello,

CrossEntropyLossFlat is what you are looking - it is equivalent to the loss listed on Kaggle. BCE would have been fine were the task binary, i.e., only two possible labels, but that is not the case for this dataset. The cross-entropy loss is simply a generalization of BCE for multi-class classification.

Please let me know if you have more questions.

krasin · September 26, 2022, 7:51am

There is a difference between multi-class and multi-label classification. In the multi-label case there could be more than one true class or label for each input. BCE is useful for multi-label classification but this is not the case.

mindtrinket · October 7, 2022, 7:45pm

Thanks for the help @BobMcDear and @krasin! Here is my Kaggle notebook to play around with.

@krasin, then what would be a better multi-class loss?

krasin · October 8, 2022, 8:26pm

So the FlattenedLoss of CrossEntropyLoss(), as pointed out by BobMcDear is what you need. It is the default in fastai for categorical labels.

You don’t need manualy to do One-hot encoding. Just add y_block=CategoryBlock in the TabularPandas block (and y_names = "Class"). Then the learner could be just learn = tabular_learner(dls, metrics=accuracy). You can check the selected loss function by executing a cell code learn.loss_func. It will be interesting to see if you will get similar results.

You can also try MSE Loss and not treating the label as category but as a number (don’t add y_block=CategoryBlock and this will be the default loss).

mindtrinket · October 9, 2022, 3:25pm

Beautiful! Thanks @krasin for such concise feedback. A significant jump here!

I’ll clean up my code and move on to my next notebook around CNN shortly.