Currently, I’m working on a problem where there are multiple classes that are potentially predicted for every record. The current loss function I’m using is fbeta where beta = 1, but I have to use a threshold to use this. So my question is, would there be a better way to predict this without using a threshold or is that pretty much the only way to do it? The idea I had was to take the distribution from the test set (e.g. 20% have X class) and then in the test set, take the top 20% for that class. I’m wondering if that makes sense and as I type this I have “Try Blah” in my head so I will test it and post my results when I have them.

2 Likes

Hi @KevinB: I think the cross-entropy loss is what you need for your problem

L(p,q) = -\sum_{i=1}^M p(x_i) \log(q(x_i)) = -(p(x_1)\log(q(x_1)) + \ldots + p(x_M)\log(q(x_M))

Here the p(x_{i}) are the probabilities predicted by the softmax output, and the q(x_{i}) are the label values (0 or 1), where i indexes the features and M is the number of features.