Lesson 5: using a simple Sequential model in Tabular Learner | take 2

teamtom · December 3, 2022, 3:47pm

Hi all,

to practice and get deeper understanding i tried to use a simple sequential model in tabular learner
now i used the titanic dataset with data cleaning and minimal feature engineering

my code is available in a colab here: Google Colab

from pathlib import Path
from fastai.tabular.all import *

df = pd.read_csv('https://gist.githubusercontent.com/teamtom/3c62c2cd71f3bd7017596aa1e16847b2/raw/844da8ca7204897a29a7f62fc3eb6c19f95214b4/titanic_train.csv')

modes = df.mode().iloc[0]
df.fillna(modes, inplace=True)

df['LogFare'] = np.log(df['Fare']+1)

df = pd.get_dummies(df, columns=["Sex","Pclass","Embarked"])

splits = RandomSplitter(seed=42)(df)

dls = TabularPandas(
    df, splits=splits, procs=[Normalize],
    cat_names=[], 
    cont_names=['Age', 'SibSp', 'Parch', 'LogFare', 'Sex_female', 'Sex_male', 'Pclass_1', 'Pclass_2', 'Pclass_3', 'Embarked_C', 'Embarked_Q', 'Embarked_S'],
    y_names='Survived', y_block=CategoryBlock()
).dataloaders(bs=64)

import torch.nn as nn

class NNet(nn.Module):
    def __init__(self):
        super(NNet, self).__init__()
        self.nnet = nn.Sequential(
            nn.Linear(12,10),
            nn.ReLU(),
            nn.Linear(10,10),
            nn.ReLU(),
            nn.Linear(10,1),
            nn.Sigmoid()
        )
    def forward(self, _, x):
        return self.nnet(x.view(-1,12))

model = NNet()

learn = Learner(dls, model=model, metrics=accuracy, loss_func=BCELossFlat(), cbs=ShowGraphCallback())

learn.fit(10, lr=0.03)

# i got this
epoch 	train_loss 	valid_loss 	accuracy 	time
0 	0.403331 	0.412062 	0.595506 	00:00
1 	0.397318 	0.387288 	0.595506 	00:00
2 	0.392440 	0.387936 	0.595506 	00:00
3 	0.389725 	0.391229 	0.595506 	00:00
4 	0.387327 	0.384920 	0.595506 	00:00
5 	0.386091 	0.381721 	0.595506 	00:00
6 	0.384088 	0.387873 	0.595506 	00:00
7 	0.382418 	0.382825 	0.595506 	00:00
8 	0.378637 	0.387598 	0.595506 	00:00
9 	0.377786 	0.384555 	0.595506 	00:00

the code runs without errors but there should be an issue somewhere because loss doesn’t improve much and accuracy doesn’t change at all

what am i doing wrong? what is the issue with my experiment?

thank you!

teamtom · December 11, 2022, 6:04pm

@muellerzr @benkarr any hints for me please?

benkarr · December 12, 2022, 10:46am

Your metrics-value does not fit the output your model produces. If you have a look at accuracy?? you’ll see that it takes the argmax of the predictions. Since your last layer has only one neuron the outputs shape is batch_size x 1 and argmax will always return 0 (hence you observe the same accuracy after each epoch). You can try to fix this by adjusting

the metric:

def my_accuracy(inp, targ, axis=-1):
    pred,targ = flatten_check(inp > 0.5, targ)
    return (pred == targ).float().mean()

learn = Learner(dls, model=model, metrics=my_accuracy, loss_func=BCELossFlat(), cbs=ShowGraphCallback())

or the model:

self.nnet = nn.Sequential(
            nn.Linear(12,10),
            nn.ReLU(),
            nn.Linear(10,10),
            nn.ReLU(),
            #nn.Linear(10,1),
            #nn.Sigmoid()
            nn.Linear(10,2), ## one node for each category
            nn.Softmax(),
        )
## use appropriate `loss_func`
learn = Learner(dls, model=model, metrics=accuracy, loss_func=CrossEntropyLossFlat(), cbs=ShowGraphCallback())

Anyways: please read up on the Forum etiquette regarding @ mentioning random forum members.

teamtom · December 12, 2022, 10:00pm

thank you for your help! i am sorry for mentioning you; now i am aware of this is against forum etiquette; i just felt lost and stuck and desperate because after a week no one answered my help request
it won’t happen again!

your custom accuracy function makes my accuracy value feedback work but the situation is not clear and i feel confused
my model with one last neuron tries to solve a simple binary classification problem (Titanic) so why do i need to create a custom function for accuracy, why doesn’t fast.ai do this out of the box as usual?
why does default accuracy uses argmax which is used for multi class classification?
also why should i add 2 neurons and softmax to my model for solving binary classification?
what am i thinking wrong?

sorry for the overwhelming questions! your answer is highly appreciated!

benkarr · December 13, 2022, 10:11am

Yeah, I kind of get that, so no worries. Just try to use that superpower of summoning people responsibly

I’m actually not shure but would guess that it is a design choice of the library. It seems that single-label classifications are assumed to work with Cross Entropy and Binary Cross Entropy is used for multi-label classification, so the metrics for these tasks have particular presumptions…
No library can be prepared to solve every problem in every possible way, so you sometimes have to either:

reformulate your problem (use two output neurons instead of one) or
adjust the solution (change the metric).

Well binary means two , so you actually have two labels: ‘survived’ and ‘not survived’ (binary classification is just a special case of multi class classification).

Lets take the one-output-neuron network:

…
    nn.Linear(10,1),
    nn.Sigmoid()

The output of the linear layer is some number and sigmoid pushes that number between 0 and 1. Values in that range can be interpreted as probabilities such that if the output of the whole network is p we can think of it as “The probability that this instance has the label ‘survived’ is p”. But this implicitly gives a second value, namely the probabilty of the instance being of label ‘not survived’ which is 1-p.

Now lets have a look at the two-output-neuron network:

…
    nn.Linear(10,2)
    nn.Softmax()

The output of the linear layer are two values and softmax pushes them between 0 and 1 such that the sum of both is one. This again can be interpreted as probabilities where the first value p_0 gives the probability that the instance is of label ‘not-survived’ (0) and the second p_1 gives the probability that the instance is of label ‘survived’ (1). Since softmax makes sure that they sum to 1, we have:

1 = p_0 + p_1 \iff p_0 = 1-p_1.

and you might see that both networks predict exactly the same things only that one predicts the second value explicitly rather than implicitly – it is just a reformulation of the problem and a choice of implementation.

Hope that can help you make sense of all of this

teamtom · December 14, 2022, 10:21pm

thank you for your answer! i appreciate it very much

this all sound reasonable but … devil in the details

in lesson 5 Jeremy demonstrates how easy and quick is to solve the Titanic problem compared to a from-scratch-solution (Lesson 5: Practical Deep Learning for Coders 2022 - YouTube)
in the dataloader he uses y_names=‘Survived’, y_block=CategoryBlock() just as i did, which means a single neuron output (i guess)
in the learner he just adds metric=accuracy (no custom metrics)
and tadaa… everything works perfectly and really simple for him

so what is the difference? i know i used a custom Sequential model and no categorical embeddings, but the output seems to be the same

please forgive my tenacity, and thank you if you answer!

benkarr · December 14, 2022, 11:17pm

The model actually has two output neurons As I mentioned, the default way of fastai seems to be that for single-label classification (so a single label per instance), there is one output neuron for each kind of label and the Survived column provides two different labels (0/1 or “not survived”/“survived”).

You can have a look at the model with:

learn.summary()

and see that

tabular_learner(dls, metrics=accuracy, layers=[10,10])

produces a model with 2 outputs The learner also uses Cross Entropy rather than BCE which you can check with:

learn.loss_func

FlattenedLoss of CrossEntropyLoss()

teamtom · December 21, 2022, 9:35pm

thank you for your enlightening answer!

i tried your suggestions and found:

reformulation of the problem: adding 2 output neurons and changing loss function to CrossEntropyLossFlat worked totally fine even when i tried to print out classification report and the confusion matrix (ClassificationInterpretation)
adjust the solution: adding a custom accuracy worked well but failed when i tried to print out classification report and the confusion matrix. I got an error for classification report (ValueError: Classification metrics can’t handle a mix of binary and continuous targets) and a matrix of 4 0s when printing out confusion matrix, like:
0 | 0
0 | 0

i guess the issue is with the custom accuracy function which is unknown for the methods in ClassificationInterpretation
i wonder if there is a way to pass the custom accuracy function to ClassificationInterpretation methods?

thank you!