Allow for more than one output for loss and metric

Proposal: Currently loss is simply one-dimensional tensor. However, in networks like ssd, there are multiple loss functions like regression and classification loss. Currently it is not possible to print both losses (to the best of my knowledge, please correct me if I am wrong). So the proposal is to support multiple outputs with the first output being the loss function to be considered. This would allow to use the other outputs to be printed via callbacks. Same with metrics.

Code to be changed:
In basic_train.py https://github.com/fastai/fastai_v1/blob/master/fastai/basic_train.py#L13, need to add condition to check if multiple outputs get the first output. Similarly next line for metrics.

2 Likes

I’ve also been thinking about this. There are cases where one model can have many outputs (such as in MaskRCNN), and each output can be associated with many losses (for example when doing image segmentation, combining CE loss and Soft Dice Loss). Moreover, we might also want to have a different LR scheme for different losses.

It seems like creating a Callback class to calculate the losses would make sense, and instead of providing the loss_fn parameter in loss_batch function, pass in a function/dict that maps the output of the model to the correct loss Callback class.

The idea of different lr schemes for different loss functions didn’t occur to me, and I haven’t really seen it being used anywhere in particular. Would be a nice experiment to see if that actually gives some better results.

Wouldnt having different lr settings have a similar effect to SoftDice + alpha*BCE where alpha is a hyperparameter you may tune ?

This would be true only if the schedule is same. Say you have cosine annealing in one, and linear decay in another. No way to tune it via only the alpha parameter

That might possibly make sense in a multitask setting i guess. But for a single task like segmentation adding losses should be fine but it’s an interesting area to dig deeper for sure.

It might be interesting to take an approach similar to Keras, which allows both inputs and outputs to be tensors, lists of tensors, or dictionaries of tensors.

It complicates making data generators a bit, but makes it easy to code things like Siamese networks, etc. I’m still familiarizing myself with the new fastai library so not sure how hard it would be to shoehorn that in…

In Keras, each output also can be given its own loss function and a weight, with the overall loss being the weighted sum of the loss of each output.

Inputs and targets can be lists of tensors now.
As for giving each output its loss and weight, it’s super easy with a callback, in the on_loss_begin function (see the RNNTrainer Callback for an example where we have three outputs).

2 Likes

I think there is detach function which prevents that. In backward begin I think. Basically I can’t have two outputs from loss function.

Two outputs from the loss function is a different question. I don’t see how it would work for the computation of gradients, it’s not loss.detach() that will cause your first issue but loss.backward().

Two outputs from loss is only for tracking/logging. For example in standard ssd/yolo you would need a classification loss, localization loss. In current framework, I am unable to get the output of loss function as both.

My intention was to use the on_backward_begin to combine the two loss, say simple addition and at the same time log both the values. So the output is a tensor and loss.backwards will work. However on https://github.com/fastai/fastai/blob/master/fastai/callback.py#L212 requires loss to be a tensor when calling detach, and this is before I can add the two losses.

Not sure if I am missing anything.

Ah, yes, this detach. When we implement callback order you’ll be able to do your combination before the recorder gets passed the losses.
Although my guess would be that you need a custom Recorder to record the both losses and replace the one of fastai, that would be cleaner.

Yes. That would be my guess too. I was doing the logging as well as the addition of loss in the same callback but the detach prevents me from doing that.

Yes. Of course. This feature would make the whole library super-flexible.

Hi all,
I was wondering if there would be support for auxiliary inputs which aren’t used in the forward pass of the neural network but only in the backward pass.

For example, when doing segmentation, we might want to have a weight map that weights pixels close to two masks more heavily than a pixel that is far away in the corner (e.g. in the UNet paper), and also use two different kinds of losses e.g. SoftDice and CrossEntropy. Moreover, in a more general sense, the relationship between model outputs and losses is many to many. So it would be difficult for the current fastai framework to handle it.

I’ve been working a local copy of a fastai-like callback framework from 3 weeks ago so it’s a bit outdated, where I have implemented the above functionality by injecting the input data from the DataLoader (which would be the input image, weight maps, masks) and the outputs of the model (the predicted probability map for each class per pixel) into the state_dict. I also created a new callback method called on_loss_calculate which takes the place of loss = loss_fn(out, *yb) in line 25 of basic_train.py

So after the forward pass the state_dict would have something like:

{
   ...other key-value pairs e.g. epoch, num_iter ...
   'input': <input image for model>
   'weight_map': <input weight map for model>
   'output': <output of model>
   'target': <ground truth>
}

Then instead of using loss_fn, I’ve made each loss a Callback for example:

class CrossEntropyLossCallback(Callback):
    def __init__(self, weight=1):
        # weight of loss when calculating weighted sum of loss
        self.weight = weight

    def on_loss_calculate(self, **kwargs):
        output = kwargs['output']
        weight_map = kwargs['weight_map']
        target = kwargs['target']
        loss = calculate_ce_loss(output, target, weight_map, weight=self.weight)
        self.loss = loss
        return loss

    def on_backward_begin(self, **kwargs):
        return torch.mean(self.loss)

Then the output of on_calculate_loss would be appended to an array in the CallbackHandler, and the loss tensors would be summed up in on_backward_begin by the CallbackHandler to be returned to the optimizer.

Though, I’m deciding between the above which makes the code kind of brittle as a change in the one of the keys in the output dict of the Dataset would break it, so I’ve also experimented with a more Redux-like implementation where the attributes are directly injected into the class by the CallbackHandler:

class CrossEntropyLossCallback(Callback):
    def __init__(self, state_to_attr_dict, weight=1):
        super().__init__(state_to_attr_dict)
        self.weight = weight

    def on_loss_calculate(self):
        loss = calculate_ce_loss(self.output, self.target, self.weight_map, weight=self.weight)
        self.loss = loss
        return loss

    def on_backward_begin(self):
        return torch.mean(self.loss)

Where the state_to_attr is a dictionary that maps the required values in the state to our instance variables, so we don’t need to use **kwargs.

I haven’t had a chance to read the new fastai code, nor have I familiarized myself with the lesson written so I’m not sure the best way to incorporate these changes into the framework. Though it seems like replacing loss_fn=[CrossEntropyLossCallback] in the initialization of the Learner class in line 95 of basic_train.py may be sufficient?

Would love to know everyone’s thoughts. Also, the same idea above could be used to calculate the metrics as the relationship between outputs and metrics are also many to many.

Actually thinking more about this, you can have your two losses by putting the first loss function as the loss and doing:

def on_loss_begin(last_output, last_target, **kwargs):
    self.loss2 = loss_func2(last_output, last_target)

def on_backward_begin(loss, **kwargs):
    return loss + loss2

That way gradients will be computed with the sum (or whatever you want to do) of the two losses. The thing printed in the side bar will only be one of the loss, but we don’t really care.
Then you can have your loss2 being one of the metrics, so that it prints the two losses on validation (not on training, but you can record it in your callback to be able to print it after training).

1 Like

First of all, thanks for sharing your code and your thoughts. I believe you don’t need to change the current fastai system callback to do the same thing: you can inject your weight_map in your loss by associating it to the target (output can’t be a list but target can): if your dataloader returns (input, (weight_map, target)), then what will be passed to the loss is (output, weight_map, target).

Your loss function can still have inner weights stored and it works without a callback.

That makes sense… didn’t think of that, though how would we deal with the case where we wanted to have 2 losses, which both required extra inputs?

Extra inputs can be stored during on_batch_begin. Then I’ve shown how to deal with two losses in the post just before.
The last piece missing to make this all flexible is to have an order attribute in each callback, to allow some of them to run before the recorder (for the loss in particular). That way you can return your custom loss in on_backward_begin before the recorder saves it.

1 Like

Alright so we define a new trainer callback class, similar to RNNTrainer then in this Callback, we would define on_batch_begin to extract the extra inputs/(or extra outputs if we have any). Then also define on_loss_begin to calculate the extra losses we pass into the trainer. Then finally aggregate all the losses in on_backward_begin.

Is the order attribute for each callback in the works, I’d be keen to help.

It’s not high priority right now, as we must finish the stuff for the upcoming courses. If you want to give it a try, you’re more than welcome to suggest a PR! The idea is to give each callback an order, the defaults respecting what’s currently happening (so Recorder is 0, then the other call_back functions at, say 5 to leave room, then the full callbacks at 10). Then the CallbackHandler executes the callbacks in order (so a callback with -1 as order would be executed before the recorder).