I recently got a very interesting satellite image dataset to work with. For this I thought I’d try out the fastai library.
The dataset is around 900 satellite images (a tiny dataset and the classes are unbalanced too!) just East of Barbados. The images have been labelled by 6 cloud experts according to 5 cloud organization classes and one undefined class. Unfortunately, the experts did not unanimously agree on one cloud type for most of the cases. For this reason I have probabilistic labels (is that what they are called technically?), eg. 20% cloud type 1, 0% cloud type 2, 60% cloud type 3, and so on, summing up to 100%.
I did not currently find any support for this in the fastai.dataset module. Did I miss something? I created an ad hoc fix just now. If anyone would be interested and this doesn’t exists yet, I could clean up my code a little and share it.
I might also write a forum post outlining how I am tackling this problem. I am sure some of you could help me a lot!
Can’t you define a multi label classification (like the satellite imagery in lesson 2) where the score for each represents the percentage that the experts agreed?
@pete.condon This is what I am doing. But I couldn’t find any way to import the CSV files with the probability labels instead of the multi-labels used in lesson 2. I was just wondering whether I missed something. I created a dirty fix for now, which I will also share soon hopefully.
Ah, interesting, could you instead use single-class classification, but with “warm” labels, not one-hot labels? That is, given cloud types 1, 2, and 3, a warm label would look like [0.2, 0.6, 0.2], rather than [0, 1, 0].
If the idea is that a given image strictly speaking only belongs to one class (but it’s tricky even for experts to agree on which class), then I would think it would be fine to train a regular softmax layer to reproduce those warm labelings. (Assuming that you actually want your model to reproduce that expert uncertainty.)
I’m also interested in the solution if you could share this. I’m trying to use pseudo-labeling with the fastai library, and for best results I want to use the predicted probabilities, and not just find the highest class.
There doesn’t seem to be a way to do pseudo-labeling with the fastai library.
And pytorch doesn’t seem to support onehot encoded target labels in loss functions like torch.nn.NLLLoss. It seems to only take a category index as target labels.
I was just wondering as well which loss to use. KL Divergence seems like a good choice. Thanks for the hint. But I have some questions about the Pytorch implementation of the KL divergence.
What activation function would you use for such a problem? I am using softmax because I want the probabilities to add up to 1, but softmax likes to highlight a single high probability, while in my case the target probabilities for different classes are often very similar.
@raspstephan yeah, that’s a good question. I don’t really know. Are things not working well with softmax?
Are you familiar with adding a “temperature” to softmax? The idea is that softmax only cares about the relative differences between its inputs (aka the logits); so for example, these all produce the exact same activations:
from torch import Tensor as T
from torch.nn import functional as F
F.softmax(T([1,2]))
F.softmax(T([2,3]))
F.softmax(T([100,101]))
So, if you want to make it harder for the softmax activations to pull apart from each other, you can try shrinking the relative differences of the logits. One way to do that would be to just divide all of them by a hyperparameter, e.g. if you divide all of the logits by 2, you halve all of their relative differences, thereby making their activations less peaked on a single class. People call this a “temperature”, the analogy I suppose being that if you divide by a high temperature, then the softmax probabilities are spread out, random, high entropy etc.
I am looking to solve a similar problem. Where my classifications are not truly classifications but proportions of a whole. Have there been any relevant updates on how this might be done in the Fast.ai space?
I found this paper that goes through methods for “Deep Gaussian mixture models”.
It is interesting and seems to be a pretty good solution. I’m not sure how difficult its implementation though fast.ai and pytorch might be?