Lesson 4 official topic

I like that explanation a lot. If I look at the distribution of predictions after training with the above loss function, I see

If I now tweak the loss function to something else (complete nonsense, of course!)

def mnist_loss(predictions, targets):
    predictions = predictions.sigmoid()
    # mind the change of the second argument from 1-predictions to 1+predictions
    return torch.where(targets==1, 1+predictions, predictions).mean() 

I see a different distribution:

As expected, changing the loss function will lead to the predictions being “optimized” differently.