Using multiple optimizers?

You can set a custom loss function for the learners:

def custom_loss(in ,target):
    # Implement loss function here
cnn_learner(data, loss_func=custom_loss)
2 Likes

My particular model uses 3 different ones for different steps and then sums it all up as the overall loss. Do I just use one loss function where it calls multiple based on the input?

I think I would need a custom callback that can get these diffferent values. Here is what the actual code looks like:

      raw_optimizer.zero_grad()
      part_optimizer.zero_grad()
      concat_optimizer.zero_grad()
      partcls_optimizer.zero_grad()
      
      raw_logits, concat_logits, part_logits, _, top_n_prob = net(img)
      part_loss = list_loss(part_logits.view(batch_size * 6, -1),
                                 label.unsqueeze(1).repeat(1, 6).view(-1)).view(batch_size, 6)
      raw_loss = creterion(raw_logits, label)
      concat_loss = creterion(concat_logits, label)
      rank_loss = ranking_loss(top_n_prob, part_loss)
      partcls_loss = creterion(part_logits.view(batch_size * PROPOSAL_NUM, -1),
                              label.unsqueeze(1).repeat(1, PROPOSAL_NUM).view(-1))
      
      total_loss = rank_loss + raw_loss + concat_loss + partcls_loss
      total_loss.backward()
      raw_optimizer.step()
      part_optimizer.step()
      concat_optimizer.step()
      partcls_optimizer.step()

Where criterion is just CrossEntropy

Edit: There are four different parts to this model, each with their own parameters and so there are four different optimizations. What I’d really like is how to implement those four different parameterizations and prepare a model for this

You have to do these steps to handle optimizers:

  1. split model to groups (for each sub-optimizer)
  2. initialize sub-optimizers with groups parameters
  3. call step for each optimizer

You can create custom callback in which you manually split model and initialize all sub-optimizers in __init__. Then you call step() for each sub-optimizer in on_backward_end() method, and returns True from method to ignore default optimizer


You can split with split method learn.split(split_func). Then you have to extend pytorch Optimizer and pass it to learner

learn = Learner(data, model, opt_func=YourOptiimzer)

If you want to split model manually but still using custom Optimizer you can pass it to learn.opt

learn.opt = YourOptimizer(manuall_params)

Remember that in YourOptimizer.__init__ you will have to initialize sub-optimizers and in YourOptimizer.step() you have to call step() for each sub-optimizer


For loss function, you have to use custom_loss as @baz said

def your_custom_loss(out, label):
  raw_logits, concat_logits, part_logits, _, top_n_prob = out

  part_loss = list_loss(part_logits.view(batch_size * 6, -1), label.unsqueeze(1).repeat(1, 6).view(-1)).view(batch_size, 6)
  raw_loss = creterion(raw_logits, label)
  concat_loss = creterion(concat_logits, label)
  rank_loss = ranking_loss(top_n_prob, part_loss)
  partcls_loss = creterion(part_logits.view(batch_size * PROPOSAL_NUM, -1),
  label.unsqueeze(1).repeat(1, PROPOSAL_NUM).view(-1))

  return rank_loss + raw_loss + concat_loss + partcls_loss

EDIT: for custom Optimizer also write zero_grad method, and call opt.zero_grad() all sub-optimizers

3 Likes

Thanks!!! I greatly appreciate the extremely thorough answer!!! When I try to use the model, I run into an error and I’m unsure if I should just make a new topic. Have you run into an issue where a model will run fine in pure pytorch but when you split and implement the model it shows an error occurring in the models definition/functions?

Split method does not modify model at all. Maybe check if after split learn.layer_groups shows groups as you intend to have, because split method isn’t perfectly clear e.g.

split_func = lambda m: (m[0][1], m[1])
learn.split(split_func)

for module

[[conv,conv],[conv,conv],[conv]]
       ^    ^   

will split into groups:

[[conv]]
[[conv]]
[[conv, conv], [conv]]

Gotcha! Thanks @Kornel Kornel! It seems to be working. Looking into the callbacks now. The model uses SGD and passes in predetermined learning rates, momentum, and weight decay. Is there a way to get access to it in the callbacks? Here is the start, I hope this is close to what you are recommending?

@dataclass
def customCallback():
  def __init__(self, learn:Learner):
    super().__init__(learn)
    self.raw = list(learn.model.pretrained_model.parameters())
    self.part = list(learn.model.proposal_net.parameters())
    self.concat = list(learn.model.concat_net.parameters())
    self.partcls = list(learn.model.partcls_net.parameters())
    
    self.raw_optim = optim.SGD(self.raw, lr=LR, momentum=0.9, weight_decay=WD)
    self.part_optim = optim.SGD(self.part, lr=LR, momentum=0.9, weight_decay=WD)
    self.concat_optim = optim.SGD(self.concat, lr=LR, momentum=0.9, weight_decay=WD)
    self.partcls_optim = optim.SGD(self.partcls, lr=LR, momentum=0.9, weight_decay=WD)
    
  def on_backward_end():
    self.raw_optim.step()
    self.part_optim.step()
    self.concat_optim.step()
    self.partcls_optim.step()

Are you also saying to split the model in this custom callback too? In init?

1 Like

You forgot to return “True” in on_backward_end

If you don’t return anything fastai will also step on default optimizer (Adam) which can collapse your intentions. Check this line

Oh no. I can see that your model is already splitted by design. (on pretrained_model, proposal_net etc.)

1 Like

Got it! Thanks. Do you know how to solve this issue by chance? Model Troubles

Yes by learn.opt.lr and learn.opt.wd

Your custom_loss should return float tensor with no size:

total_loss.type() == torch.FloatTensor
total_loss.size() == torch.Size([])

You can use total_loss.squeeze(0) if you have total_loss.size() == torch.Size([1])

1 Like

Thank you so very much Kornel, I greatly appreciate the advice and assistance with this. Is this what you are meaning?

def your_custom_loss(out, label):
  raw_logits, concat_logits, part_logits, _, top_n_prob = out
  
  creterion = torch.nn.CrossEntropyLoss()

  part_loss = list_loss(part_logits.view(4 * 6, -1), label.unsqueeze(1).repeat(1, 6).view(-1)).view(4, 6)
  raw_loss = creterion(raw_logits, label)
  concat_loss = creterion(concat_logits, label)
  rank_loss = ranking_loss(top_n_prob, part_loss)
  partcls_loss = creterion(part_logits.view(4 * 6, -1),
  label.unsqueeze(1).repeat(1, 6).view(-1))

  total_loss = rank_loss + raw_loss + concat_loss + partcls_loss
  total_loss = torch.FloatTensor(total_loss)
  total_loss.type() == torch.FloatTensor
  total_loss.size() == torch.Size([1])
  
  return total_loss.squeeze(0)

Edit:
It seems there’s two things going on in the models training:

            raw_optimizer.zero_grad()
            part_optimizer.zero_grad()
            concat_optimizer.zero_grad()
            partcls_optimizer.zero_grad()

            raw_logits, concat_logits, part_logits, _, top_n_prob = net(img)
            part_loss = model.list_loss(part_logits.view(batch_size * PROPOSAL_NUM, -1),
                                        label.unsqueeze(1).repeat(1, PROPOSAL_NUM).view(-1)).view(batch_size, PROPOSAL_NUM)
            raw_loss = creterion(raw_logits, label)
            concat_loss = creterion(concat_logits, label)
            rank_loss = model.ranking_loss(top_n_prob, part_loss)
            partcls_loss = creterion(part_logits.view(batch_size * PROPOSAL_NUM, -1),
                                    label.unsqueeze(1).repeat(1, PROPOSAL_NUM).view(-1))

            total_loss = raw_loss + rank_loss + concat_loss + partcls_loss
            total_loss.backward()
            raw_optimizer.step()
            part_optimizer.step()
            concat_optimizer.step()
            partcls_optimizer.step()

First there is this. Then:

            for i, data in enumerate(trainloader):
                with torch.no_grad():
                    img, label = data[0].cuda(), data[1].cuda()
                    batch_size = img.size(0)
                    _, concat_logits, _, _, _ = net(img)
                    # calculate loss
                    concat_loss = creterion(concat_logits, label)
                    # calculate accuracy
                    _, concat_predict = torch.max(concat_logits, 1)
                    total += batch_size
                    train_correct += torch.sum(concat_predict.data == label.data)
                    train_loss += concat_loss.item() * batch_size
                    progress_bar(i, len(trainloader), 'eval train set')

            train_acc = float(train_correct) / total
            train_loss = train_loss / total

The first is the partial unsupervised learning this model has. The second is the actual accuracy.

Editx2, the first is what the loss_fn should be, the second the metric

Ok question, if the model is already split into groups, and the optimizer is the same (it’s SGD) do I need a custom callback?

Actually, you don’t. I thought that your optimizers will be different for each group.
In learner, you can just switch default Adam to SGD.

learn = Learner(data, model, loss_func=your_loss_func, opt_func=SGD)

But why you would want to have SGD instead of Adam? If there is no particular reason use default Adam.

And If you wish to have, for example, different learning rates for different parts you can use split anyway

def split_met(m):
  return (m.pretrained_model, m.proposal_net, m.concat_net, m.partcls_net)

learn.split(split_met)

and then learning rates (first will probably be smaller cause it is “pretrained”_model)

learn.fit(epochs, (lr1, lr2, lr3, lr4))

You’re welcome :wink:

1 Like

Got it! Thanks! So really the only thing I need to do is just modify the loss function (for the simple sake of testing, then split the model accordingly) because we all love Adam. In the paper they kept everything at the same learning rate and didn’t try differentiating them all.

Edit: solved the issue, I can pass pretrained model weights to the call to resnet.

We’re training!!! Thank you so much @baz, @Kornel, and @MicPie for all your help! Next few weeks I plan on doing a Medium article on this and I will post on the forum as well. Thank you all so much!

2 Likes

As I say that, one last final issue that should be rudimentary to solve. So for their metrics calculation, it looks as such:

for i, data in enumerate(trainloader):
    with torch.no_grad():
        img, label = data[0].cuda(), data[1].cuda()
        batch_size = img.size(0)
        _, concat_logits, _, _, _ = net(img)
        # calculate loss
        concat_loss = creterion(concat_logits, label)
        # calculate accuracy
        _, concat_predict = torch.max(concat_logits, 1) <- only one I am after
        total += batch_size
        train_correct += torch.sum(concat_predict.data == label.data)
        train_loss += concat_loss.item() * batch_size
        progress_bar(i, len(trainloader), 'eval on train set')

train_acc = float(train_correct) / total
train_loss = train_loss / total

Currently my metric is this:

def custom_metric(out, label):
  _, concat_logits, _, _, _ = out
  _, acc = torch.max(concat_logits, 1)
  
  return acc
  

Which is just returning the numerator (I believe?) In any case, it is an accuracy function where label is concat_logits. I should be able to work with our accuracy function now and do this right?

n = concat_logits.shape[0]
out = out.argmax(dim=-1).view(n,-1)
targs = targs.view(n, -1)
return (out==targs).float.mean()

Or does it need to be slightly different?

Thanks again for all your help! This model trains extremely slowly due to the batch size, but I did notice it trains faster in pure pytorch. Anyone have a sneaking suspicion as to why? Batch size is the same and everything. On average an epoch is a few minutes vs 25. Both same environment

Edit: The answer to the accuracy function is right.

why would you like to use multiple optimizers in your model ?

I had misread it. Turns out it was the same optimizer just used in four seperate instances in the source code :slight_smile:

So having different optimizers grouped on different modules didn’t add any thing to your model ?

I did not try that! :slight_smile: I just simply used the same at those four instances.

1 Like