Lesson 5: MNIST Model, missing softmax function

jmenke · September 11, 2019, 11:22am

Hi Guys,

In lesson 5 Jeremy talks about this Model:

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(784, 10, bias=True)

    def forward(self, xb): return self.lin(xb)

And he says it is basically a logistic model. However, there should be a sigmoid/softmax transformatoin before the output layer. Right. But the nn.linear as the name suggets does not do that. So I am wondering where does the non-linear transformation happen. Or does it even happen?

Pomo · September 11, 2019, 5:56pm

This confused me too!

fastai automagically appends activation and loss functions appropriate to the structure of the data given in the DataBunch. I am not sure where this default behavior is documented. But you can see it happen in a debugger by tracing the evaluation of an image batch. HTH, Malcolm

jmenke · September 12, 2019, 9:57am

Okay,
so if my output would be a float it would know I want to do a regression rather than a classification.

Pomo · September 12, 2019, 4:53pm

Right, and (sorry) the details are lost in my leaky memory. But if you search this forum for “regression floatlist”, you should find examples that show how to specify that the output type is a float, implying a regression.

To confirm, you can then check that the loss_fn member of the learner is MSE. I have found it helpful, for understanding and correctness, to check the conclusions that fastai makes about the data and the learner.

muskai · September 12, 2019, 9:25pm

That’s because the loss function is nn.CrossEntropyLoss(). In PyTorch, you don’t need a final layer softmax activation when the loss func is Cross Entropy Loss. It’s being taken care within the function.

jmenke · September 13, 2019, 9:33am

Oh of course, that makes sense. Maybe a different topic, but what kind of activation functions are used for the last layer if we do a regression?

muskai · September 15, 2019, 7:11pm

For regression you could either use a linear activation function like y_range (range(y_min, y_max)*1.2) or not use anything at all, though I think it’s a good idea to go with the former as it brings every output to the desired range.

heye0507 · September 15, 2019, 7:57pm

Yes, it is magic. [just kidding]

I have the same question almost 5 months ago where you don’t call loss_func, you don’t do any one-hot labeling and fastai does it all for you. (That’s why sometimes I don’t like top - down )

The trick is the label_cls in the datablock API, when you call label_from…, if you don’t specify your label_cls, fastai will do it for you under the hood, and select the best loss function that fits the problem.

Currently I think support label_cls (you can check docs) are floating, multi-label…

To clarify what I meant, here is an example,

In you MNIST example, if you you just call label_from_df() and do nothing, your will have a nn.crossentropy loss for your loss function. Also, the y is in the right form turned to a multi-classification problem. Your data.c = 10

However, if you call label_from_df(label_cls=floatlist), then you will have a data.c = 1, and your model will have 1 output in the end. Also, the loss function is MSE.lossflat(). Now you have a regression model just try to minimize your prediction and y.

Also, just use lossflat(), it does normal as torch loss function. But if you ever run to problem that torch loss is complaining about preds, target shapes has dim problem, lossflat() will almost solve the problem by flatten your y.