in Lesson 4, Jeremy says that a first problem that he encountered with the State Farm model is that there are “reasonable good solutions that are way easy to find”. Then he argues that if you predict all the time 0.01, e.g., then you are going to be right in 9 out of 10 categories. However, this is actually not possible since softmax demands that the probabilities add up to 1, but another way a NN can “play it safe” is by always setting 1 category to 1 and the rest to 0. However, the cross-entropy loss function only cares about your prediction for the right category (in the summation, only that is multiplied by 1 while the rest are multiplied by 0). Thus, wouldn’t that approach result in a huge loss in 90% (on average, assuming a balanced dataset) of the samples? Am I missing something?