TabularModel for classification does not have softmax output layer

When performing binary classification on tabular data, the output layer of the TabularModel class appears to be linear, whereas I would expect a softmax. I am curious why this is. I’m not sure if I’m missing something in the code or in the architecture design. Any help understanding would be greatly appreciated.

@sgugger

Take this with a grain of salt because I’m not an expert on this by any means. I think ItemList.get_label_cls is where the decision gets made of how to handle your model outputs based on the label type. If your labels are single floats it treats it as a regression problem, if they are single integers it treats it as classification, and if your labels are lists it treats it as multicategory classification. So the first thing I’d check is if you are passing floats instead of integers (speaking from personal experience of course :slight_smile: )

You can also specify how to treat your labels using the datablocks API instead of letting it decide automatically. In your case that would mean using .get_label_cls() with the parameters label_cls=CategoryList for your third step. That should specify that the labels you are passing in should be treated as binary categories.

Sorry, I should add some clarification to my original post. When I call learn.get_preds() the results have clearly gone through a softmax (all rows sum to 1), so the code is behaving properly. What I was confused on was the fact that the TabularLearner class did not specify a softmax anywhere.

Upon further investigation I think I figured this out. The CrossEntropyLoss loss function is being used, which expects the linear model outputs. Then when you call learn.get_preds() there’s some code that passes the model output through an additional activation function determined by the loss function (see _loss_func2activ).

Probably should have done a bit more homework before posting; sorry admins!. If someone notices that I’m missing something here please let me know.

1 Like

Didn’t realize that, definitely good to know.

1 Like

Hi,

I have a similar issues. I am trying to predict 6 categories using the tabular api. I changed the target field type to category through pandas df[‘target’].astype(‘category’). However when I see architecture details, I see the following Linear(in_features=500, out_features=6, bias=True). I was expecting a Softmax being applied as the output. The model is using loss_func=FlattenedLoss of CrossEntropyLoss() as default.

Anyone can shed some light here? Do you target need to be one-hot encoded, can they remain in the same column for the API to be able to adjust to the inputs?

Thanks in advance!

The final layer in the model is linear because, from the docs on CrossEntropyLoss:

This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

So, to summarize, the softmax you are expecting is wrapped into the loss function instead of the model object itself. The CrossEntropyLoss expects outputs from a linear layer:

The input is expected to contain raw, unnormalized scores for each class.

I can’t confirm because I haven’t tested it myself, but my inclination is you only need the single target column for multi-class classification.

2 Likes

Ok thanks! I have roughly 17,000 records, do you think its worth even considering to explore a deep learning model such as the following. I just the accuracy plateau, with ~3,000 to test on.

Should i resort to other techniques (RandomForest, XGBoost)?

Try everything! Make sure you don’t overfit since it’s a small dataset.