First of all Jeremy actually meant binary cross-entropy
when he said binomial
.
Threshold value is a hyperparameter so the best way to choose it is to try a few values and see which one is optimal, i.e. gets best classification accuracy. You should probably start by applying the same threshold value for all classes. Of course optimal is in the eye of the experimenter. You might place different weights on correct classification for each of the classes.
Also, I think that threshold values could be set independently for each class. Someone please correct me if I’m wrong.