[Sorry if this was a little long-winded, but I spent a while thinking it through, and thought it might be helpful for someone.]
I had the same question and wasn’t entirely satisfied with previous answers - because the sentence ‘To decide if an output represents a 3 or a 7, we can just check whether it’s greater than 0.0’ appears twice in chapter 4’ and I think the answer is different each time.
The first time, this number is arbitrary:
We haven’t introduced a loss function yet, we just have a model (linear1) that takes in load of images as a tensor, and outputs a tensor containing a ‘prediction’ for each image. At this point these predictions are simply numbers from anywhere on the real number line. But we want a way to make these numbers reflect one of two categories - a 3 or a 7. So we can pick an arbitrary point, and say anything above it is a 3, and anything below is a 7 or vice versa.
Note that this is before any learning has been done - the parameters of the model have been initialised randomly, so we are not losing any information by picking an arbitrary dividing line. The statement
corrects = (preds>[arbitrary]).float() == train_y
is not comparing the prediction directly with the target 1 or 0. It is comparing the statement (prediction greater than [arbitraryvalue]) which can be evaluated as True (1) or False (0) with the target 1 or 0. As long as we kept the metric the same throughout the learning process, we could keep tweaking the model to make it more accurate, and (ideally) it would eventually be a model that predicted a number greater than [arbitraryvalue] for 3s and lower for 7s.
However, the second time it appears, 0.0 is not arbitrary, as we have already defined mnist_loss. Within mnist_loss there is a sigmoid function:
def mnist_loss(predictions, targets):
predictions = predictions.sigmoid()
return torch.where(targets==1, 1-predictions, predictions).mean()
Now, the loss and the metric aren’t the same thing exactly, but they should represent roughly the same aim - there’s no point in training something to do one thing, and then checking it on a completely different thing. In this case, mnist_loss is low for a given image when the sigmoid of the prediction is close to 1 for a 3 and close to 0 for a 7. Given that a sigmoid curve continuously increases and crosses 0.5 at x=0, this means a model with low loss gives very positive predictions for 3s and very negative predictions for 7s, and the human way of interpreting it is any prediction above 0 is a 3 and any below 0 is a 7. The batch accuracy is the proportion of predictions in a batch that were above 0 for 3s and below 0s for 7s.