Give less loss to misclassified samples?

Pomo · September 9, 2020, 5:42pm

Hi all. This question is breaking my brain - can anyone help?

I’m working on a classification problem where I know for sure that 10% of the data is mislabeled. After a bit of training for example, the sample is labelled green, but it’s actually yellow, and the model predicts it as yellow. To the model, this looks like a wrong prediction, right? But I know it is likely to be correctly classified.

I would ideally want the gradient to pull the true positives more positive and pull the “false negatives” less strongly toward their incorrect label.

Does it make any sense at all to have negatives count for less in the loss function?

Is there a name for this idea, and how would I implement it?

Thanks for any clarifications!

muellerzr · September 9, 2020, 6:02pm

You could check out this thread and experiment with it: Custom Metrics (FP/TN and FN/TP) in fastaiv2

I’d also suggest then having some sort of “factor” you multiply by from our output of the two (such as .8 FP, .2 FN, etc)

jwuphysics · September 9, 2020, 6:28pm

The direct way that I have handled this sort of problem is by using a weighted CrossEntropyLoss. For example, you can do something like

# weighted cross entropy (be careful with ordering!)
weight = tensor([0.8, 1.0])
loss_func = nn.CrossEntropyLoss(weight=weight)

# custom metrics, e.g., what Zach mentioned
metrics = [accuracy, RocAuc(), ...]

learn = Learner(dls, model, loss_func=loss_func, metrics=metrics)

Pomo · September 9, 2020, 9:13pm

Thanks for responding. I kinda sorta get it and will need to study your replies.

But here’s an immediate confusion about the weights approach. weights is a tensor that weights each class. When the label is green for a sample, I want to weight the green class, but when the label is yellow, I want to weight the yellow class.

Can it be done? Does it make sense even to try to do it?

jwuphysics · September 10, 2020, 9:45pm

I’m not 100% sure I understand your question, but in the example I provided, indeed false negatives and true negatives will be penalized more heavily. I think that this works in practice, sometimes.

But, if you want to avoid this, then perhaps you could also go the route of modifying the LabelSmoothingCrossEntropyLoss class in fast.ai. Basically, you’ll need to apply the eps smoothing parameter asymmetrically – the targets should be smoothed for the yellow class but not for the green or other classes. I think you could implement this with with class indexing.

dougforrest · September 11, 2020, 6:24am

It doesn’t make sense to weight false negatives in the loss function IMO as there is no concept of a false negative in any loss function that I am aware of. There are a number of loss functions that have been developed to deal with noisy datasets. Like this Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels.

I would start with removing the training data which has a large difference between the target and predicted (from a trained model) in the training set and see if this improves your validation performance. This method was used in the first place solution to the recent prostate cancer grade assessment competition.

In short the method is:

train k-folds
Predict hold-out sets with the trained model
Remove the training data which has a high disparity between ground truth and predicted

You could also use the method to manually correct the labels rather than remove if feasible.

This repo is a pretty good source of other methods for dealing with noisy labels.

Pomo · September 11, 2020, 9:37pm

Hi Doug,

You are right that “false negative” does not make sense. To the model, the prediction is just a negative. In my mind only, the prediction is probably correct, so I called it a “false negative”. I will try to go back and edit, so that others won’t be confused.

Thanks for citing the papers and repo. I did in fact try deleting the negatives from the training set, and retraining. Validation came out just a little better. But I had never heard of this method being used, so moved on to improving the model. It’s certainly worth trying again.

I’m going to mark your reply as the solution, but appreciate everyone’s responses. Anyone please feel free to add to the conversation.