ArcFaceLoss double optimizer

Hi everyone, I’m trying to figure out how I could possibly insert a second optimizer, for the ArcFace Loss function, as suggested from the pytorch-metric-learning website.

loss_fn = losses.ArcFaceLoss(num_classes=num_classes,
          embedding_size=embedding_size, margin=28.6)

model_optimizer = torch.optim.Adam(embedding_net.parameters(), lr=lr, decay=wd)
loss_optimizer = torch.optim.Adam(loss_fn.parameters(), lr=lr)

learn = Learner(
    data,
    embedding_net,
    loss_func=loss_fn,  # what to insert here?
    metrics=None,  # working on cosine_similarity
    opt_func=model_optimizer 
)

My idea was to set up something like this:

for epoch in range(epochs):
    learn.fit(1, lr=lr)
    model_optimizer.step(learn.recorder.loss)
    loss_optimizer.step(learn.recorder.loss)

Does it have any sense? I don’t think I would fully benefit from the training algorithm if I basically restarted it 1 epoch at a time…

I don’t have experience with type of situation but suggesting a couple of things:

I found this Forums post which seems similar to what you are trying to implement—they don’t share their solution but seems like they use callbacks.

I also prompted ChatGPT and implemented a minimal solution it provided in this Colab notebook in which I create a custom loss function (to mimic ArcFaceLoss having its own parameters) and a custom learner (which passes both the model’s parameters and the loss function’s parameters to the optimizer). It does successfully train and the loss function’s parameters do change after training which I think indicates they have been learned. Not sure if that’s what you are looking for, hope it helps!

1 Like