ConvLearner for probabilistic labels (30% class 1, 20% class 2, ... = total 100%)


I recently got a very interesting satellite image dataset to work with. For this I thought I’d try out the fastai library.

The dataset is around 900 satellite images (a tiny dataset and the classes are unbalanced too!) just East of Barbados. The images have been labelled by 6 cloud experts according to 5 cloud organization classes and one undefined class. Unfortunately, the experts did not unanimously agree on one cloud type for most of the cases. For this reason I have probabilistic labels (is that what they are called technically?), eg. 20% cloud type 1, 0% cloud type 2, 60% cloud type 3, and so on, summing up to 100%.

I did not currently find any support for this in the fastai.dataset module. Did I miss something? I created an ad hoc fix just now. If anyone would be interested and this doesn’t exists yet, I could clean up my code a little and share it.

I might also write a forum post outlining how I am tackling this problem. I am sure some of you could help me a lot!

1 Like

Can you share the dataset and nb (if possible)…


Can’t you define a multi label classification (like the satellite imagery in lesson 2) where the score for each represents the percentage that the experts agreed?

@ecdrid I am working on a notebook to share

@pete.condon This is what I am doing. But I couldn’t find any way to import the CSV files with the probability labels instead of the multi-labels used in lesson 2. I was just wondering whether I missed something. I created a dirty fix for now, which I will also share soon hopefully.

Ah, interesting, could you instead use single-class classification, but with “warm” labels, not one-hot labels? That is, given cloud types 1, 2, and 3, a warm label would look like [0.2, 0.6, 0.2], rather than [0, 1, 0].

If the idea is that a given image strictly speaking only belongs to one class (but it’s tricky even for experts to agree on which class), then I would think it would be fine to train a regular softmax layer to reproduce those warm labelings. (Assuming that you actually want your model to reproduce that expert uncertainty.)


Nice one, I’m looking forward to seeing the solution :slight_smile:

I’m also interested in the solution if you could share this. I’m trying to use pseudo-labeling with the fastai library, and for best results I want to use the predicted probabilities, and not just find the highest class.

There doesn’t seem to be a way to do pseudo-labeling with the fastai library.

And pytorch doesn’t seem to support onehot encoded target labels in loss functions like torch.nn.NLLLoss. It seems to only take a category index as target labels.

In any case I’d love to see your workaround!

@eugeneware interesting, I hadn’t noticed that nn.NLLoss only works on class indices. I guess it won’t work for warm labels.

I think the loss we want then is nn.KLDivLoss.

1 Like

I was just wondering as well which loss to use. KL Divergence seems like a good choice. Thanks for the hint. But I have some questions about the Pytorch implementation of the KL divergence.

According to their documentation ( they define the loss as (if I understand correctly)
np.mean(y * (np.log(y) - x))

Shouldn’t it be
np.mean(y * np.log(y/x)?

I also get negative loss values while the KL divergence should always be positive. Any idea?

I think you just need to make sure that you feed it your predicted log probabilities, not actual probabilities.

1 Like

I just wrote a new post in the Applications part of the forum with a notebook and my changes in the fastai library: Cloud classification using the fastai library: Dealing with fuzzy labels, small sample size and class imbalance

@cqfd Thanks a lot for the tipps. Looking forward to more :slight_smile:

1 Like

I have another question @cqfd

What activation function would you use for such a problem? I am using softmax because I want the probabilities to add up to 1, but softmax likes to highlight a single high probability, while in my case the target probabilities for different classes are often very similar.

Any idea?

1 Like

@raspstephan yeah, that’s a good question. I don’t really know. Are things not working well with softmax?

Are you familiar with adding a “temperature” to softmax? The idea is that softmax only cares about the relative differences between its inputs (aka the logits); so for example, these all produce the exact same activations:

from torch import Tensor as T
from torch.nn import functional as F


So, if you want to make it harder for the softmax activations to pull apart from each other, you can try shrinking the relative differences of the logits. One way to do that would be to just divide all of them by a hyperparameter, e.g. if you divide all of the logits by 2, you halve all of their relative differences, thereby making their activations less peaked on a single class. People call this a “temperature”, the analogy I suppose being that if you divide by a high temperature, then the softmax probabilities are spread out, random, high entropy etc.


I am looking to solve a similar problem. Where my classifications are not truly classifications but proportions of a whole. Have there been any relevant updates on how this might be done in the space?

I found this paper that goes through methods for “Deep Gaussian mixture models”.

It is interesting and seems to be a pretty good solution. I’m not sure how difficult its implementation though and pytorch might be?