Dealing with mislabeled data

gesman · March 9, 2017, 2:26am

This is the scenario of fraud detection.
Say model was trained on transactions that were labeled as “clean” and “fraudulent”.
Later investigation determined that number of “clean” transaction were actually fraudulent and labels were corrected.

Is there a proper way to re-train models?
What’s a general approach in Deep Learning to “correct” models that were already pre-trained on mislabeled data without starting from scratch?

Is there are optimized ways for models to “forget” certain data inputs?

Gleb

jeremy · March 9, 2017, 10:48pm

Hmm. Interesting question. The only approach I can think of to ‘undo’ those is to fine tune the model by presenting the corrected labels to it a bunch of times - probably by greatly increasing their weights for an epoch or two.