Weighting each training image by label confidence

I’m training a multi-label image classifier, where each image can have between 0 and 10 possible class labels. Five different people have separately labeled each image, but there’s still a large percentage of images that have low average consensus scores. Continuing to manually review the data to increase consensus is cost-prohibitive.

I’ve been considering two options:

1.) Decide on a threshold for the average consensus score, and only train on images that have above this threshold of consensus. Potentially try something like curriculum learning, gradually walking this threshold back and iteratively transfer learning on harder and harder examples (making the assumption that lower consensus score correlates highly with difficult / less clear-cut training examples rather than random mistakes).

2.) Incorporate this consensus score directly into training.

I’d really like to try (2) if I can figure out how to make it work. A high-level idea of how I thought I might be able to apply (2), would be to weight each image’s contribution to the loss by its average consensus score.

I noticed that the loss function BCEWithLogitsLoss has a weight parameter: “a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size nbatch.” What I’m wondering is if there is some way that I can use this, so that I provide an additional dataframe column of weights, and for each new batch, the weight associated with each image is looked up and applied to the loss function calculation.

If anyone has any hints or suggestions on how I might be able to accomplish this I’d really appreciate it. Alternatively, I’d love to hear any other ideas about how I may be able to incorporate this consensus information. One thing I started to look at was label smoothing, but it doesn’t appear that it’s compatible with multi-label models, and I don’t think I can supply it with a per-image confidence score.

I think I ran into kind of the same problem, which is that I needed to access meta-info about the tensors that were loaded during training (here these are the weights). To do that I applied a patch to DeviceDataLoader, which is:

from itertools import tee
def new_iter(self):
    dl = iter(self.dl)
    dl.sampler_iter, self.sampler_iter = tee(dl.sampler_iter)
    for b in dl:
        yield self.proc_batch(b)

from fastai.basic_data import DeviceDataLoader
DeviceDataLoader.__iter__= new_iter

You can then create a callback that modifies the loss on backward begin:

class ChangeLoss(LearnerCallback):
    _order = -999
    def __init__(self, weights, *args, **kwarg):
        super().__init__(*args, **kwargs)
        self.weights = weights
        
    def on_backward_begin(self, last_loss, **kwargs):
        dl = self.learn.data.train_dl
        idxs = next(dl.sampler_iter)
        loss = last_loss*self.weights[idxs]
        return {'last_loss': loss.mean()}

Here, weights has shape (N_{train}, N_{classes}) while your loss should yield (N_{batch}, N_{classes}) shape if I understood correctly. For that to work, you need to pass reduction='none' to your loss.

1 Like

Thanks! I’m not yet too experienced with the callback internals so this will take me a bit to digest, but will report back with how it went!

No problem, don’t hesitate to browse the docs, callbacks are very well explained. Don’t hesitate if you have questions, I learned about them the hard way recently so many things are fresh in my mind.

1 Like

@florobax Hey, I’m getting AttributeError: '_DataLoaderIter' object has no attribute 'sampler_iter' and don’t see that argument in DataLoader. Should I define it?

Oh yeah it’s because I’m using nightly version of pytorch where it got renamed. It’s called sample_iter on current release. It is an attribute of the _DataLoaderIter class from pytorch.

1 Like

@florobax Thanks! Got rid of that error. Am I using this right? Working on a multilabel dataset with 4 classes and tried using all weights of 1 to test:

from itertools import tee
def new_iter(self):
    dl = iter(self.dl)
    dl.sampler_iter, self.sampler_iter = tee(dl.sampler_iter)
    for b in dl:
        yield self.proc_batch(b)

from fastai.basic_data import DeviceDataLoader
DeviceDataLoader.__iter__= new_iter

class ChangeLoss(LearnerCallback):
    _order = -999
    def __init__(self, weights, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.weights = weights
        
    def on_backward_begin(self, last_loss, **kwargs):
        dl = self.learn.data.train_dl
        idxs = next(dl.sampler_iter)
        loss = last_loss*self.weights[idxs]
        return {'last_loss': loss.mean()}

weights = torch.Tensor(np.ones((df.shape[0]))).cuda()
    
change_loss = ChangeLoss(learn=learn, weights=weights)

learn.loss_func = BCEWithLogitsFlat(reduction='none')

learn.fit_one_cycle(1, 1e-4, callbacks=[change_loss])

But getting the error:

RuntimeError: The size of tensor a (128) must match the size of tensor b (32) at non-singleton dimension 0

My batch size is set to 32. I tried modifying to self.weights[idxs].repeat(4) and got the error:

TypeError: unsupported format string passed to Tensor.format

It is probably due to BCEWithLogitsFlat, which modifies the loss shape in a way I still don’t really understand (I’ll probably make some tests with it to see how it works). You could try with axis=0, might work.
If it doesn’t, can you try to print last_loss.shape inside the callback ? If you have time, you’ll probably find your problem by playing with:

    def __call__(self, input:Tensor, target:Tensor, **kwargs)->Rank0Tensor:
        input = input.transpose(self.axis,-1).contiguous()
        target = target.transpose(self.axis,-1).contiguous()
        if self.floatify: target = target.float()
        input = input.view(-1,input.shape[-1]) if self.is_2d else input.view(-1)
        return self.func.__call__(input, target.view(-1), **kwargs)

This is what is called with BCEWithLogitsFlat (here self.func is nn.BCEWithLogits) to see what are the shapes at different steps.
Besides, I am not sure what df is, but if it is the full dataframe containing all ids, it will probably not work as it will contain validation set ids (depending on how you slit your dataset. A version that will always work is weights = torch.Tensor(np.ones((len(learn.data.train_ds)))).cuda().

1 Like

Thanks again. last_loss.shape is 128, so looks like there is a loss index for each batch index * 4 classes.

I switched to nn.BCEWithLogitsLoss(reduce='none') instead, and now am getting a StopIteration error, so it looks like dl.sampler_iter is getting used up. I can fake it to work without erroring by giving it idxs explicitly like this:

def on_backward_begin(self, last_loss, **kwargs):
    dl = self.learn.data.train_dl
    try:
        idxs = next(dl.sampler_iter)    
    except StopIteration:           
        idxs = [86, 534, 274, 653, 463, 600, 645, 609, 36, 39, 589, 603, 499, 368, 299, 44, 627, 553, 166, 445, 42, 507, 55, 290, 347, 157, 596, 466, 293, 57, 117, 246]            

But not sure how to change the new_iter. I’m having a bit of trouble understanding the function new_iter actually if there’s any chance you could explain what it’s doing.

If it does stop iteration, it means your dataloader is empty for some reason. What do len(learn.data.train_dl)and x, y = next(iter(learn.data.train_dl)) yield ? Id it yields something, try to call nn.BCEWithLogitsLoss(reduce='none')(learn.model(x), y) and thenx, y = next(iter(learn.data.train_dl)) again. Does it work ?
To explain what new_iter does, it is basically a copy of fastai’s DeviceDataLoader.__iter__, where I just initialize the iter object before the for loop so I can create a clone (that’s what tee does). This way, I have access to the indexes twice: once in the for loop inside the for loop of this function (that will yield something when it is called inside the fit function, and once in the callback. To explain why it is necessary, let’s precise what iter does:

  • First, what is an iterator ? Well it is basically an object that has a __next__ method. When it is created, an iterator is initiated with some parameters that depend on what it wants to do. Then, each time something want to access the next object from it (for instance a for loop), it calls its __next__ method and returns the asked object.
  • When you are training, fastai calls in its fit function this line: for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):, which automatically calls iter(learn.data.train_dl).
  • This is equivalent to calling learn.data.train_dl.__iter__(), which is this new_iter function
  • This function begins by creating an iter object, with iter(self.dl). self.dl is a DataLoader object from pytorch, so it calls the __iter__ method defined there. This method returns a _DataLoaderIter object from pytorch as well. This object, when created, instanciates a sample_iter attribute which also is an iterator. It depends on the sampler you are using, but by default it is iter(torch.randperm(n).tolist()), which basically randomize the order of your dataset. So to wrap it up, dl = iter(self.dl) is a _DataLoaderIter object that contains a sampler_iter attribute used to determine which items go into the next batch.
  • What is important here is that a new sample_iter is created each time iter is called on your dataloader (basically each time you put it in a dataloader), and a new random order is created at this time.
  • So, what happens when you write for b in dl: it calls next(dl), which itself calls ̀nexton the_DataLoaderIter, which callsnexton itssample_iterattribute, which yields the indexes of the inputs contained in the next batch to load. This indexes are then used to yield the correctx, y` pair for the fit function.
  • But we can’t access this index, it is used internally by pytorch and is never returned by any function. So the idea is to create a double. We first instantiate the dataloader, so the sample_iter object is created, then we copy it thanks to tee. One version stays with the _DataLoaderIter object and will be used internally by pytorch. The other is stored as an attribute of fastai’s DeviceDataLoader object, which we can then access in our callback.
  • So in our on_backward_begin method, the original indexes have already been used to compute the loss, and we can’t access them anyway. But we can call next on the clone we created, which will yield us the indexes of the objects of the current batch. We can now use it to get the corresponding weights, and update the loss accordingly.

Hope this is not too much of a mess, but I wanted to make sure everything was there. Don’t hesitate if you have question, and also don’t hesitate to dive into source code, it’s what will make you progress into understanding what happens behind the scene. Here, all the magic happens here for the pytorch side and here (you can see the original __iter__ function, I did not deviate much from it).

1 Like

Thanks very much for your help! Going to take some time to make sure I understand all of this.

1 Like

Don’t hesitate if there is something unclear (which is probably the case)