Predict probability densities, not individual categories

LevJam · February 12, 2024, 5:06pm

Hello everybody.

Is it possible to train a vision classification model uses probability densities for labels and outputs a probability density as the prediction?

I am doing the kaggle HMS eeg/spectrogram competition and the goal is not to classify images but to predict the probability of each image being each category.

Thanks in advance.

Conwyn · February 12, 2024, 7:32pm

Hi James

I may have misunderstood but when you categorize you produce a series of values

Cat 1 (horse) 0.2 Cat 2 (dog) 0.1 cat 3 hippo (0.7). It is likely to be a hippo.

Regards Conwyn

LevJam · February 12, 2024, 8:32pm

When you train a vision_learner with categorically labeled data, the input labels are categories and the prediction output is a probability distribution across all your categories, with the higher probability highlighted (hippo in your example). So in this sense you’re completely right.

I assume somewhere in the API your categories are one-hot encoded so your 3 category labels become cat1 = [1, 0, 0], cat2 = [0, 1, 0], cat3 = [0, 0, 1]. However, I don’t have such data. My labels are not specific categories but probability distributions. Taking your example, my labels aren’t “horse”, “dog”, “hippo”, but [0.3, 0.5, 0.2].

My problem is I don’t know how to get these probability distributions as labels into the model.

Conwyn · February 12, 2024, 9:09pm

Hi James

Warning I may not be understanding.

If we had ten animals A0 to A9 then softmax in the final activation softmax would pick the highest probability but you could access the final layer values.

So rather than animals let A0 be a probability of 00% to 09%

I am not confident with your term probability distributions. Do you mean that A0 to 0-9% or A0 is (average,standard dev) from which you could draw a distribution. So given training data then the most likely distribution is the Gaussian Mixed Model but you want to use deep learning to speed up the process.

Regards Conwyn

LevJam · February 13, 2024, 10:51am

Hi Conwyn,

Thanks. I think there’s a misunderstanding about what I mean when I say “probability distribution” as the input.

In traditional classification tasks, each instance is associated with a single label from a finite set of discreet one-hot encoded categories. The final softmax layer generates a probability distribution from which you can select the highest probability.

My scenario differs from this. I don’t have discreet labels, but a probability distribution across categories. The labels have been generated by experts that disagree, so a label might be [0.5, 0.4, 0.1, 0.0, 0.0, 0.0] (I’ve since learned this is called “soft labeling”. I could chose to relabel this to [1.0, 0.0, 0.0, 0.0, 0.0, 0.0], but that would be a different problem.

I am not sure how to compute the loss function for this kind of input/output combination, as standard cross-entropy loss expects one-hot encoded labels.

Regards,
James