Yes! Exactly. The reason is multi label now gives the model what’s called a sigmoid function which puts everything on its own threshold. A basic example is in multi-label problems, we can have 1,2, or 3 labels present in a given moment, which means there are times label 2 or label 3 may not be present. Which then holds true the idea there are cases where it can have no labels that reach above a threshold we put. This is different from our regular classification because we apply argmax, where the general idea is we gather all the raw probabilities and scale them to 0-1 total so all our probability sum to 100%, and we take the highest one as our answer. Instead here we look at each raw probability and see if they’re above a % threshold. For example I could have say 15%, and if a particular label is above .85, then it’ll show up