The threshold you choose for yes/no affects the sensitivity of your predictions. Sometimes you want a fairly low threshold so that more things are considered to be “yes”, for example when it’s really important that you have no false negatives. When detecting cancer, where positive means a tumor is found, you want to err on the side of predicting yes when it’s really no (i.e. the patient is actually not sick) and not the other way around.
You can also vary this threshold and see what happens. This is how an ROC curve is created, for example.
In the case of the toxic comments competition, the threshold isn’t really important as they expect you to predict probabilities for each column independently. In fact, using a threshold will harm your score as predictions that are really confident (i.e. close to 0 or 1) are penalized more heavily if they turn out to be wrong.
The loss function to use for this competition is the regular binary cross entropy / negative log loss, but done for each column independently, added up, and then divided by 6.