Hmm, do you mean multi-label or multi-CLASS? I think so far
multi-class default is Categorical Cross Entropy using softmax and multi-label default is Binary Cross Entropy using sigmoid. (These are also the defaults in the fastai library based on the labelclass)
So from my understanding of Jeremy in the lecture it would often make sense for real world mutliclass problems to not use softmax but rather the binary cross entropy (multi-label) version and then use thresholds and/or argmax with the results to figure out the single class. In that way we also get the probabilities for the class, undistorted by softmax, in order to be able to differentiate given classes vs. “background”/“no label”, in case probabilities are small for all of the classes. Is this what he meant?
This would finally answer my question asked during v3 part 1 :
from here.