Hi,
I have a perhaps dumb question: when training a model, we need to choose an optimizer such as optim.Adam
, and if, during the training process, we further choose to use cosine annealing to schedule our learning rate (by supplying a cycle_len
parameter), we are effectively overwriting the learning rate as computed by our optimizer, are we? If that is the case, does it still matter which optimizer we choose?
Thanks.