[Help Needed] Using sample-wise weights and custom loss function

How can we use sample-wise loss weights in fastai? Using class-wise weights for loss is extremely simple and straight-forward, but using sample-wise weights for loss seems to be a bit more involved when using fastai.

I have a weight associated with each sample in my training data set that I would like to use to weight the loss for that particular sample. This is accomplished easily enough outside of the fastai framework in 3 steps.

1.) defining a custom data set:

import torch.utils.data as torchdata
class SampleWeightsData(torchdata.Dataset):
    A dataset that will return x, y, and the weight for x
    def __init__(self, full_df, in_cols, y_col, w_col):
        self.full_df = full_df
        self.x = torch.FloatTensor(full_df[in_cols].values)
        self.y = torch.LongTensor(full_df[y_col].values)
        self.w = torch.FloatTensor(abs(full_df[w_col] / full_df[w_col].max()).values) 
    def __len__(self):
        return len(self.full_df)
    def __getitem__(self, i):
        return self.x[i], self.y[i], self.w[i]

2.) Using a custom loss function:

    import fastai.layers as fast_layers

    class CEFSampleWeightLoss(nn.Module):
        def __init__(self):
            self.base_loss = fast_layers.CrossEntropyFlat(reduction='none')
        def forward(self, x, y, w): 
            loss = self.base_loss(x, y)
            loss *= (w**2)
            return loss.sum()

3.) a simple modification to the standard fitting process so that these weights are used by the custom loss function:

    for batch_num, row in enumerate(dl):  
        x = row[0].to(device)
        y = row[1].to(device)
        w = row[2].to(device)
        # Clear the gradients calculated in the last mini-batch
        if training: opt.zero_grad()
        # Do forward pass, calculate loss    
        nn_outputs = model(x).to(device)
        loss = loss_fx(nn_outputs, y, w)

I am unsure how to accomplish this same objective in the framework of a fastai learner. My custom loss function is expecting 3 parameters to its forward function instead of the traditional 2, so I don’t see how I can use the learner’s fit function. Further, I need to get my databunch to return the weights per sample along with the features and label. FYI, my data is tabular (though I am only using continuous features currently, no categorical), so I will want to be using tabular learners and tabular lists. I apologize if this is a trivial matter; I am new to using fastai (and relatively new to deep learning generally) and haven’t ever made use of fastai’s callbacks (which I have a hunch will be helpful here). I’ve also scoured the forums and been unable to find much on this topic.

I would just do this manually and use bits and pieces of fastai’s genius where applicable, but I can’t seem to replicate my baseline results without using fastai’s learner. I have a separate topic about this.


Just a quick idea. You could define a custom model Module that unpacks x, y and w, and sends x through your original model. For the output, combine w with model(x). The custom loss function then unpacks w from x and returns the loss.

Just an idea - the proof is in the doing, which I have not tried.

Thanks for the (speedy) reply; that sounds like it would work! At least to solve the problem that my loss f(x) is expecting 3 params instead of 2. But what about needing to have my databunch and dataloaders all return the weight as well, and not apply any transformations (like Normalize) to them?

I once heard that the devil is in the details.

You could attach w to the target (making target a tuple) so that transformations would not mess with it. Conceptually, w can be seen as an aspect of the target.

I am not at all familiar with the transformation pipeline. But in your shoes, I would also experiment with making x a list and trace the fastai DataLoader. Maybe the transformations apply only to the first item. That would be convenient. Or maybe there’s a spot to insert your own code to extract the image to give to the transformation pipeline.

Please let us know what you figure out.

I think the easiest way would be to pass your weights as another label column(s) and split the tensor into labels and weights in the loss function and any metrics. If using a fastai head, you would need to subtract the length of the weights from your databunch data.c = data.c - weight_len so the model head would have the correct number of outputs.

This would probably mess up displaying labels, in data.show_batch() for example, but that should be it. You can verify the weights are passed as expected using x, y = data.one_batch(); y.

1 Like

Wow I love this solution, it was so simple to implement! I began to implement @Pomo’s solution but it required much more work. Thanks to both of you for the help; I’ve got it working now. Below are some code snippets to help anyone else who is trying to accomplish something similar:

First, I had to modify fastai’s tabulardatabunch’s “from_df” to allow multiple dependent variables. I changed:
cont_names = ifnone(cont_names, list(set(df)-set(cat_names)-{dep_var}))

dep_var_set = {dep_var} if type(dep_var) == str else set(dep_var)
cont_names = fcore.ifnone(cont_names, list(set(df)-set(cat_names)-dep_var_set)) 

and then for the dep_var variable I passed in an array [target_column, weight_column].

Second, my custom loss function:

import fastai.layers as fast_layers

class CEFSampleWeightLoss(nn.Module):
    def __init__(self, mode='linear'):
        self.base_loss = fast_layers.CrossEntropyFlat(reduction='none')
        self.mode = mode
    def forward(self, y_hat, yw):
        y = yw[:,0].long()
        w = yw[:,1]
        loss = self.base_loss(y_hat, y)
        if self.mode == 'linear': loss *= w
        elif self.mode == 'exp': loss *= (w**2)
        return loss.mean()

Finally, when you call learner.get_preds you just need to extract the predictions and apply the softmax yourself. This is because fastai tries to map loss function to activation function for you, but when you use a custom loss function it obviously doesn’t know what activation to apply.

val_preds, val_yw = learner.get_preds(ds_type=btrain.DatasetType.Valid)
val_probs_class1 = nn.Softmax(dim=1)(val_preds).numpy()[:,1] # extract preds and softmax
val_y = val_yw[:,0].long() # extract labels from [labels, weights]

Happy coding!