# Predict probability densities, not individual categories

Hello everybody.

Is it possible to train a vision classification model uses probability densities for labels and outputs a probability density as the prediction?

I am doing the kaggle HMS eeg/spectrogram competition and the goal is not to classify images but to predict the probability of each image being each category.

Hi James

I may have misunderstood but when you categorize you produce a series of values

Cat 1 (horse) 0.2 Cat 2 (dog) 0.1 cat 3 hippo (0.7). It is likely to be a hippo.

Regards Conwyn

When you train a `vision_learner` with categorically labeled data, the input labels are categories and the prediction output is a probability distribution across all your categories, with the higher probability highlighted (hippo in your example). So in this sense you’re completely right.

I assume somewhere in the API your categories are one-hot encoded so your 3 category labels become `cat1 = [1, 0, 0], cat2 = [0, 1, 0], cat3 = [0, 0, 1]`. However, I don’t have such data. My labels are not specific categories but probability distributions. Taking your example, my labels aren’t “horse”, “dog”, “hippo”, but [0.3, 0.5, 0.2].

My problem is I don’t know how to get these probability distributions as labels into the model.

Hi James

Warning I may not be understanding.

If we had ten animals A0 to A9 then softmax in the final activation softmax would pick the highest probability but you could access the final layer values.

So rather than animals let A0 be a probability of 00% to 09%

I am not confident with your term probability distributions. Do you mean that A0 to 0-9% or A0 is (average,standard dev) from which you could draw a distribution. So given training data then the most likely distribution is the Gaussian Mixed Model but you want to use deep learning to speed up the process.

Regards Conwyn

Hi Conwyn,

Thanks. I think there’s a misunderstanding about what I mean when I say “probability distribution” as the input.

In traditional classification tasks, each instance is associated with a single label from a finite set of discreet one-hot encoded categories. The final softmax layer generates a probability distribution from which you can select the highest probability.

My scenario differs from this. I don’t have discreet labels, but a probability distribution across categories. The labels have been generated by experts that disagree, so a label might be [0.5, 0.4, 0.1, 0.0, 0.0, 0.0] (I’ve since learned this is called “soft labeling”. I could chose to relabel this to [1.0, 0.0, 0.0, 0.0, 0.0, 0.0], but that would be a different problem.

I am not sure how to compute the loss function for this kind of input/output combination, as standard cross-entropy loss expects one-hot encoded labels.

Regards,
James