Understanding Softmax/Probabilities Output on a multi-class classification problem

I’m working on it, @jeremy
I see some differences between the output of the two log_preds = learn.predict() and log_preds,y = learn.TTA() --> the sum of the probabilities computed from learn.predict is 1, while the sum of the probabilities computed from learn.TTA is less than 1.

log_preds,y = learn.TTA()
probs = np.exp(log_preds)
probs[:3,:]
array([[ 0.24539,  0.09634,  0.00952,  0.07535,  0.08283,  0.11779,  0.01073,  0.01777],
       [ 0.39541,  0.03649,  0.02922,  0.21219,  0.01865,  0.11463,  0.02149,  0.07635],
       [ 0.49963,  0.01744,  0.14847,  0.00922,  0.10761,  0.03257,  0.03157,  0.04786]], dtype=float32)

where the sum of probs on each line is less than 1

and

log_preds = learn.predict()
probs = np.exp(log_preds)
probs[:3,:]
array([[ 0.19239,  0.10078,  0.01741,  0.02733,  0.34674,  0.26757,  0.01891,  0.02886],
       [ 0.31218,  0.04001,  0.03897,  0.32251,  0.02798,  0.0951 ,  0.04427,  0.11897],
       [ 0.61758,  0.01188,  0.18024,  0.00552,  0.15407,  0.00619,  0.01183,  0.01269]], dtype=float32)

where the sum of probs of each line is 1

In any case to display the most correct/incorrect images - we will use the learn.predict() function.

1 Like

So here it is

log_preds = learn.predict()
y = data.val_y

or

log_preds, y = learn.TTA()

and then

num_classes = len(data.classes)

preds = np.argmax(log_preds, axis=1)
probs = np.exp(log_preds)

# the following functions have an extra parameter - y which is the selected_class between (0,num_classes-1)
# y is a number in the case of displaying the most correct/incorrect classes 
# y is a vector in the case of displaying the most uncertain classes

def plot_val_with_title(idxs, title, y):
    imgs = np.stack([data.val_ds[x][0] for x in idxs])    
    if type(y) == int: title_probs = [probs[x,y] for x in idxs]
    else:    
        key = 0;
        for x in idxs:
            title_probs = [probs[x,y[key]] for x in idxs]
            key += 1
    
    print(title)
    return plots(data.val_ds.denorm(imgs), rows=1, titles=title_probs)

def plots(ims, figsize=(12,6), rows=1, titles=None):
    f = plt.figure(figsize=figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i])

def load_img_id(ds, idx): return np.array(PIL.Image.open(PATH+ds.fnames[idx]))

def most_by_mask(mask, y, mult):
    idxs = np.where(mask)[0]
    return idxs[np.argsort(mult * probs[idxs,y])[:4]]

# Here the mult=-1 when the is_correct flag is true -> that means that when we want to display the most correct classes we will make a descending sorting (argsort) because we want that the biggest probabilities to be displayed first. 
# When is_correct is false, we want to display the most incorrect classes, so we want an ascending sorting since our interest is in the smallest probabilities.

def most_by_correct(y, is_correct): 
    mult = -1 if is_correct==True else 1
    return most_by_mask((preds == data.val_y)==is_correct & (data.val_y == y), y, mult)

In order to call these functions

most_uncertain = np.argsort(np.average(np.abs(probs-(1/num_classes)), axis = 1))[:4]
idxs_col = np.argsort(np.abs(probs[most_uncertain,:]-(1/num_classes)))[:4,-1]
plot_val_with_title(most_uncertain, "Most uncertain predictions", idxs_col)

# for most correct classes with label 0
label = 0
plot_val_with_title(most_by_correct(label, True), "Most correct class 0", label) 

# for most incorrect classes with label 2
label = 2
plot_val_with_title(most_by_correct(label, False), "Most incorrect class 2", label)
10 Likes

Congrats! I’m looking forward to checking it out :slight_smile:

1 Like

Can you explain why we call learn.predict() in the first place? What does it serve to do? Does this take images from the validation set and comes up with the “prediction”?

1 Like

Both learn.predict() andlearn.TTA() are using the validation set, because on the validation set we have the ground_truth - the true labels, so we can compute how accurate is the model. (for the test set we have no labels, for kaggle competitions)

The prediction is done by looking at the sample image and returning x scores (where x = number of classes), this numbers usually are between (-oo and 1] but they don’t represent anything meaningful, this is why we turn them into probabilities by np.exp(preds).

Learn.TTA() does prediction on the validation set + modified versions of the sample images from the validation set.

There is something else we can do with data augmentation: use it at inference time (also known as test time). Not surprisingly, this is known as test time augmentation, or just TTA.

TTA simply makes predictions not just on the images in your validation set, but also makes predictions on a number of randomly augmented versions of them too (by default, it uses the original image along with 4 randomly augmented versions). It then takes the average prediction from these images, and uses that. To use TTA on the validation set, we can use the learner’s TTA() method.

We need the output of these functions (these predictions) in order to check how good is our model. For example to compute the accuracy

log_preds,y = learn.TTA()
accuracy(log_preds,y)
0.99650000000000005

Another example is that we want to display the most incorrect/uncertain classifications - in order to have an intuition why our model doesn’t do what we expect from it to do.

3 Likes

@alessa I packaged up your changes and also refactored it a bit into a class. I also found a bug with missing parentheses (which I suspect came from my original code - sorry!) which I fixed. It’s now in fastai, and here’s an example of it being used with the new kaggle seedlings competition:

9 Likes

Thank you Jeremy! Next time I will try to provide directly the class.
I spend lots of time with this line title_probs = [self.probs[x,y[i]] for i,x in idxs] which was giving me errors cause it didn’t like the type of vector idxs, from where the extra unneeded for.
Only now thanks to your code I see the way to do it: title_probs = [self.probs[x,y[i]] for i,x in enumerate(idxs)].
Thanks!

3 Likes

Actually I have noticed that is missing the plot_most_uncertain. I will take it as an assessment and I will implement it directly in the class that you have created and push it on the git.

@Jeremy, I am not allowed to push on the git, but here are the modifications

  1. I added number of classes to the init (I choose this way - in order to keep the class call simpler as you proposed ImageModelResults(data.val_ds, log_preds)

def init(self, ds, log_preds):
self.ds = ds
self.preds = np.argmax(log_preds, axis=1)
self.probs = np.exp(log_preds)
self.num_classes = log_preds.shape[1]

  1. I added the following methods
def most_uncertain(self):
    return np.argsort(np.average(np.abs(self.probs-(1/self.num_classes)), axis = 1))[:4]

def most_uncertain_class(self, most_uncertain_idx):
    return np.argsort(np.abs(self.probs[most_uncertain_idx,:]-(1/self.num_classes)))[:4,-1]

def plot_by_uncertain(self):
    """
    most_uncertain() - will return the most uncertain indexes which can belong to different classes
    most_uncertain_class() - will return the specific classes of this uncertain indexes
    we need to know the classes in order to display them on the plot along to the probabilities values
    """
    most_uncertain_idxs = self.most_uncertain();
    return self.plot_val_with_title(most_uncertain_idxs, self.most_uncertain_class(most_uncertain_idxs))

def plot_most_uncertain(self): return self.plot_by_uncertain()

The way to call the function

imr = ImageModelResults(data.val_ds, log_preds)
imr.plot_most_uncertain()

2 Likes

We can fork and create a PR…

1 Like

Thanks @alessa . I’m happy to make those changes directly, but you might enjoy learning about how to send a Pull Request with the changes yourself - it’s a great skill to have in your toolbox! If you’d like to give it a go, install this and follow the relevant steps in the readme: https://github.com/github/hub

If you’d like to learn more, have a look at https://www.atlassian.com/git/tutorials/making-a-pull-request . Before you send your pull request (PR), ensure it only has the specific changes you want to make (e.g. don’t include updated notebooks, temp files, etc).

If you’d rather not, no problem - I’ll make the changes directly.

5 Likes

Thanks Jeremy for the links, I am happy to learn to do that! :slight_smile:

1 Like

Pull request done :slight_smile:

2 Likes

You might want to add a Screenshot with one from your Dog Breed or some other dataset to show the output of this change. This might help us understand how to use the API. I was expecting to provide the Class that I am interested in seeing the most uncertain examples, but it doesn’t take any parameters. You can see an example of adding screenshots to Pull requests here - https://github.com/fastai/fastai/pull/43. You can drag the screenshot into the github comments section and it will insert it there, very similar to how you insert in the forums here.

1 Like

Thanks Ramesh for your reply. It was very useful to see how I should do a proper --> pull request. :slight_smile:

The most uncertain examples is done by following the initial method which was not taking into account each class aside, but all classes together. So it was looking into all the probabilities which were close to 0.5 (since it was a 2 classes problem).

most_uncertain = np.argsort(np.abs(probs -0.5))[:4]
plot_val_with_title(most_uncertain, "Most uncertain predictions")

You are right, it is more useful to have a specific method which will plot the most uncertain examples by class - I will make the updates for it.

1 Like

OK so would you prefer I wait for those updates before I merge your PR?

1 Like

yes please

So you can merge the PR.

  • the plot_most_uncertain function requires now an argument of the selected class
  • the class is more documented; I have learned what are the docstrings and when to use them and I applied them properly :slight_smile:
  • I have clean also the plot_val_with_title function (since the selected class (y) will always be an integer from now on, not a vector) and I added the case when the idxs is empty (for ex: there are no incorrect classes) - in order to prevent the errors.

2 Likes

Wow you sure learn fast! :slight_smile: Thanks for the updated PR.

BTW, just a minor matter regarding this:

    # computes the probabilities
    self.probs = np.exp(log_preds)
    # extracts the number of classes
    self.num_classes = log_preds.shape[1]

My personal belief is that comments like this are redundant (since we know from the attribute names what is being set in each case), and therefore make the code a little harder to read. My opinion on this is somewhat widely shared, but certainly some disagree, so perhaps take this merely as a tip on how to fit in with the commenting style of this particular library.

2 Likes

Thanks for the feedback. Yes you’re right, I’ll keep it simple from now on, specially when the name of vars are descriptive enough (and usually they should be like this).