Selecting a Tabular Multi-label Loss function

I need help with the multi-class logarithmic loss for this dated Kaggle competition. Microsoft Malware Classification Challenge (BIG 2015) | Kaggle

π‘™π‘œπ‘”π‘™π‘œπ‘ π‘ =βˆ’1π‘βˆ‘π‘–=1π‘βˆ‘π‘—=1𝑀𝑦𝑖𝑗log(𝑝𝑖𝑗),

I have a tabular table set up using BCEWithLogitsLossFlat but struggle to understand if it is the best fit within these options fastai - Loss Functions

TL;DR- I am still horrible at loss functions and choosing the best one.

Hello,

CrossEntropyLossFlat is what you are looking - it is equivalent to the loss listed on Kaggle. BCE would have been fine were the task binary, i.e., only two possible labels, but that is not the case for this dataset. The cross-entropy loss is simply a generalization of BCE for multi-class classification.

Please let me know if you have more questions.

6 Likes

There is a difference between multi-class and multi-label classification. In the multi-label case there could be more than one true class or label for each input. BCE is useful for multi-label classification but this is not the case.

2 Likes

Thanks for the help @BobMcDear and @krasin! Here is my Kaggle notebook to play around with.

@krasin, then what would be a better multi-class loss?

1 Like

So the FlattenedLoss of CrossEntropyLoss(), as pointed out by BobMcDear is what you need. It is the default in fastai for categorical labels.

You don’t need manualy to do One-hot encoding. Just add y_block=CategoryBlock in the TabularPandas block (and y_names = "Class"). Then the learner could be just learn = tabular_learner(dls, metrics=accuracy). You can check the selected loss function by executing a cell code learn.loss_func. It will be interesting to see if you will get similar results.

You can also try MSE Loss and not treating the label as category but as a number (don’t add y_block=CategoryBlock and this will be the default loss).

1 Like

Beautiful! Thanks @krasin for such concise feedback. A significant jump here!

I’ll clean up my code and move on to my next notebook around CNN shortly.

2 Likes