# Using multiple optimizers?

You can set a custom loss function for the learners:

``````def custom_loss(in ,target):
# Implement loss function here
``````
``````cnn_learner(data, loss_func=custom_loss)
``````
2 Likes

My particular model uses 3 different ones for different steps and then sums it all up as the overall loss. Do I just use one loss function where it calls multiple based on the input?

I think I would need a custom callback that can get these diffferent values. Here is what the actual code looks like:

``````      raw_optimizer.zero_grad()

raw_logits, concat_logits, part_logits, _, top_n_prob = net(img)
part_loss = list_loss(part_logits.view(batch_size * 6, -1),
label.unsqueeze(1).repeat(1, 6).view(-1)).view(batch_size, 6)
raw_loss = creterion(raw_logits, label)
concat_loss = creterion(concat_logits, label)
rank_loss = ranking_loss(top_n_prob, part_loss)
partcls_loss = creterion(part_logits.view(batch_size * PROPOSAL_NUM, -1),
label.unsqueeze(1).repeat(1, PROPOSAL_NUM).view(-1))

total_loss = rank_loss + raw_loss + concat_loss + partcls_loss
total_loss.backward()
raw_optimizer.step()
part_optimizer.step()
concat_optimizer.step()
partcls_optimizer.step()
``````

Where criterion is just CrossEntropy

Edit: There are four different parts to this model, each with their own parameters and so there are four different optimizations. What I’d really like is how to implement those four different parameterizations and prepare a model for this

You have to do these steps to handle optimizers:

1. split model to groups (for each sub-optimizer)
2. initialize sub-optimizers with groups parameters
3. call step for each optimizer

You can create custom callback in which you manually split model and initialize all sub-optimizers in `__init__`. Then you call step() for each sub-optimizer in `on_backward_end()` method, and returns True from method to ignore default optimizer

You can split with split method `learn.split(split_func)`. Then you have to extend pytorch Optimizer and pass it to learner

``````learn = Learner(data, model, opt_func=YourOptiimzer)
``````

If you want to split model manually but still using custom Optimizer you can pass it to `learn.opt`

``````learn.opt = YourOptimizer(manuall_params)
``````

Remember that in `YourOptimizer.__init__` you will have to initialize sub-optimizers and in `YourOptimizer.step()` you have to call step() for each sub-optimizer

For loss function, you have to use custom_loss as @baz said

``````def your_custom_loss(out, label):
raw_logits, concat_logits, part_logits, _, top_n_prob = out

part_loss = list_loss(part_logits.view(batch_size * 6, -1), label.unsqueeze(1).repeat(1, 6).view(-1)).view(batch_size, 6)
raw_loss = creterion(raw_logits, label)
concat_loss = creterion(concat_logits, label)
rank_loss = ranking_loss(top_n_prob, part_loss)
partcls_loss = creterion(part_logits.view(batch_size * PROPOSAL_NUM, -1),
label.unsqueeze(1).repeat(1, PROPOSAL_NUM).view(-1))

return rank_loss + raw_loss + concat_loss + partcls_loss
``````

EDIT: for custom Optimizer also write zero_grad method, and call `opt.zero_grad()` all sub-optimizers

3 Likes

Thanks!!! I greatly appreciate the extremely thorough answer!!! When I try to use the model, I run into an error and I’m unsure if I should just make a new topic. Have you run into an issue where a model will run fine in pure pytorch but when you split and implement the model it shows an error occurring in the models definition/functions?

Split method does not modify model at all. Maybe check if after split `learn.layer_groups` shows groups as you intend to have, because split method isn’t perfectly clear e.g.

``````split_func = lambda m: (m, m)
learn.split(split_func)
``````

for module

``````[[conv,conv],[conv,conv],[conv]]
^    ^
``````

will split into groups:

``````[[conv]]
[[conv]]
[[conv, conv], [conv]]``````

Gotcha! Thanks @Kornel Kornel! It seems to be working. Looking into the callbacks now. The model uses SGD and passes in predetermined learning rates, momentum, and weight decay. Is there a way to get access to it in the callbacks? Here is the start, I hope this is close to what you are recommending?

``````@dataclass
def customCallback():
def __init__(self, learn:Learner):
super().__init__(learn)
self.raw = list(learn.model.pretrained_model.parameters())
self.part = list(learn.model.proposal_net.parameters())
self.concat = list(learn.model.concat_net.parameters())
self.partcls = list(learn.model.partcls_net.parameters())

self.raw_optim = optim.SGD(self.raw, lr=LR, momentum=0.9, weight_decay=WD)
self.part_optim = optim.SGD(self.part, lr=LR, momentum=0.9, weight_decay=WD)
self.concat_optim = optim.SGD(self.concat, lr=LR, momentum=0.9, weight_decay=WD)
self.partcls_optim = optim.SGD(self.partcls, lr=LR, momentum=0.9, weight_decay=WD)

def on_backward_end():
self.raw_optim.step()
self.part_optim.step()
self.concat_optim.step()
self.partcls_optim.step()
``````

Are you also saying to split the model in this custom callback too? In init?

1 Like

You forgot to return “True” in `on_backward_end`

If you don’t return anything fastai will also step on default optimizer (Adam) which can collapse your intentions. Check this line

Oh no. I can see that your model is already splitted by design. (on `pretrained_model`, `proposal_net` etc.)

1 Like

Got it! Thanks. Do you know how to solve this issue by chance? Model Troubles

Yes by `learn.opt.lr` and `learn.opt.wd`

Your `custom_loss` should return float tensor with no size:

``````total_loss.type() == torch.FloatTensor
total_loss.size() == torch.Size([])
``````

You can use `total_loss.squeeze(0)` if you have `total_loss.size() == torch.Size()`

1 Like

Thank you so very much Kornel, I greatly appreciate the advice and assistance with this. Is this what you are meaning?

``````def your_custom_loss(out, label):
raw_logits, concat_logits, part_logits, _, top_n_prob = out

creterion = torch.nn.CrossEntropyLoss()

part_loss = list_loss(part_logits.view(4 * 6, -1), label.unsqueeze(1).repeat(1, 6).view(-1)).view(4, 6)
raw_loss = creterion(raw_logits, label)
concat_loss = creterion(concat_logits, label)
rank_loss = ranking_loss(top_n_prob, part_loss)
partcls_loss = creterion(part_logits.view(4 * 6, -1),
label.unsqueeze(1).repeat(1, 6).view(-1))

total_loss = rank_loss + raw_loss + concat_loss + partcls_loss
total_loss = torch.FloatTensor(total_loss)
total_loss.type() == torch.FloatTensor
total_loss.size() == torch.Size()

``````

Edit:
It seems there’s two things going on in the models training:

``````            raw_optimizer.zero_grad()

raw_logits, concat_logits, part_logits, _, top_n_prob = net(img)
part_loss = model.list_loss(part_logits.view(batch_size * PROPOSAL_NUM, -1),
label.unsqueeze(1).repeat(1, PROPOSAL_NUM).view(-1)).view(batch_size, PROPOSAL_NUM)
raw_loss = creterion(raw_logits, label)
concat_loss = creterion(concat_logits, label)
rank_loss = model.ranking_loss(top_n_prob, part_loss)
partcls_loss = creterion(part_logits.view(batch_size * PROPOSAL_NUM, -1),
label.unsqueeze(1).repeat(1, PROPOSAL_NUM).view(-1))

total_loss = raw_loss + rank_loss + concat_loss + partcls_loss
total_loss.backward()
raw_optimizer.step()
part_optimizer.step()
concat_optimizer.step()
partcls_optimizer.step()
``````

First there is this. Then:

``````            for i, data in enumerate(trainloader):
img, label = data.cuda(), data.cuda()
batch_size = img.size(0)
_, concat_logits, _, _, _ = net(img)
# calculate loss
concat_loss = creterion(concat_logits, label)
# calculate accuracy
_, concat_predict = torch.max(concat_logits, 1)
total += batch_size
train_correct += torch.sum(concat_predict.data == label.data)
train_loss += concat_loss.item() * batch_size

train_acc = float(train_correct) / total
train_loss = train_loss / total
``````

The first is the partial unsupervised learning this model has. The second is the actual accuracy.

Editx2, the first is what the loss_fn should be, the second the metric

Ok question, if the model is already split into groups, and the optimizer is the same (it’s SGD) do I need a custom callback?

Actually, you don’t. I thought that your optimizers will be different for each group.
In learner, you can just switch default Adam to SGD.

``````learn = Learner(data, model, loss_func=your_loss_func, opt_func=SGD)
``````

But why you would want to have SGD instead of Adam? If there is no particular reason use default Adam.

And If you wish to have, for example, different learning rates for different parts you can use split anyway

``````def split_met(m):
return (m.pretrained_model, m.proposal_net, m.concat_net, m.partcls_net)

learn.split(split_met)
``````

and then learning rates (first will probably be smaller cause it is “pretrained”_model)

``````learn.fit(epochs, (lr1, lr2, lr3, lr4))
``````

You’re welcome 1 Like

Got it! Thanks! So really the only thing I need to do is just modify the loss function (for the simple sake of testing, then split the model accordingly) because we all love Adam. In the paper they kept everything at the same learning rate and didn’t try differentiating them all.

Edit: solved the issue, I can pass pretrained model weights to the call to resnet.

We’re training!!! Thank you so much @baz, @Kornel, and @MicPie for all your help! Next few weeks I plan on doing a Medium article on this and I will post on the forum as well. Thank you all so much!

2 Likes

As I say that, one last final issue that should be rudimentary to solve. So for their metrics calculation, it looks as such:

``````for i, data in enumerate(trainloader):
img, label = data.cuda(), data.cuda()
batch_size = img.size(0)
_, concat_logits, _, _, _ = net(img)
# calculate loss
concat_loss = creterion(concat_logits, label)
# calculate accuracy
_, concat_predict = torch.max(concat_logits, 1) <- only one I am after
total += batch_size
train_correct += torch.sum(concat_predict.data == label.data)
train_loss += concat_loss.item() * batch_size
progress_bar(i, len(trainloader), 'eval on train set')

train_acc = float(train_correct) / total
train_loss = train_loss / total
``````

Currently my metric is this:

``````def custom_metric(out, label):
_, concat_logits, _, _, _ = out
_, acc = torch.max(concat_logits, 1)

return acc

``````

Which is just returning the numerator (I believe?) In any case, it is an accuracy function where label is concat_logits. I should be able to work with our accuracy function now and do this right?

``````n = concat_logits.shape
out = out.argmax(dim=-1).view(n,-1)
targs = targs.view(n, -1)
return (out==targs).float.mean()
``````

Or does it need to be slightly different?

Thanks again for all your help! This model trains extremely slowly due to the batch size, but I did notice it trains faster in pure pytorch. Anyone have a sneaking suspicion as to why? Batch size is the same and everything. On average an epoch is a few minutes vs 25. Both same environment

Edit: The answer to the accuracy function is right.

why would you like to use multiple optimizers in your model ?

I had misread it. Turns out it was the same optimizer just used in four seperate instances in the source code So having different optimizers grouped on different modules didn’t add any thing to your model ?

I did not try that! I just simply used the same at those four instances.

1 Like