Modifying parameters while using distributed

mraggi · October 3, 2019, 8:53pm

Hi,

So I made a callback to modify some neural network parameters of a model (sample code at the end) while training.

My question is: what happens if train on multiple GPUs? My neural network got ruined for some reason, after a few epochs. I had okay training error, but massive validation error, and I suspect that it has something to do with using two GPU’s. The problem is that each attempt takes like 8 hours, and it only starts to get noticeable by the end

class MyCallback(LearnerCallback):
    def __init__(self, learn:Learner):
        super().__init__(learn)
        self.important_parameter = learn.model[17].weight # or whatever

    def on_batch_end(self, **kwargs):
        with torch.no_grad():
            modify_in_place(self.important_parameter)

So my specific question is: will this callback work?

Thanks!

ilovescience · October 3, 2019, 11:26pm

Do you actually want to update the weights with a callback? First of all, why would you need to do that? Secondly, that’s not what your code above would do. It would only updated self.important_parameter which was first initialized with the weight value.

mraggi · October 4, 2019, 12:37am

Yep, I want to update some parameter weights with a callback.

For example, weight decay does that inside fastai, but it actually does it in the optimizer step, not a callback. I saw the code for weight decay inside fast.ai, and to be honest, I don’t understand why it works on a multi-GPU environment:

def step(self)->None:
    "Set weight decay and step optimizer."
    # weight decay outside of optimizer step (AdamW)
    if self.true_wd:
        for lr,wd,pg1,pg2 in zip(self._lr,self._wd,self.opt.param_groups[::2],self.opt.param_groups[1::2]):
            for p in pg1['params']: p.data.mul_(1 - wd*lr)
            if self.bn_wd:
                for p in pg2['params']: p.data.mul_(1 - wd*lr)
        self.set_val('weight_decay', listify(0, self._wd))
    self.opt.step()

Now, I’m not doing weight decay, but something kind of like that: making sure the weights satisfy certain constraints I need (don’t worry about why).

I’ve tested the above code with a single thread and it works and does what I think it does. But on multiple threads, since I can’t use jupyter, it’s a bit harder to see if the models are maybe getting out of sync or whatever. I think it might be getting applied only to one of the two models. But then again, the code above (from fast.ai) would suffer from the same problem… wouldn’t it?

I also tried like this:

class MyCallback(LearnerCallback):
    def __init__(self, learn:Learner):
        super().__init__(learn)
        

    def on_batch_end(self, **kwargs):
        important_parameter = learn.model.module[17].weight # or whatever
        with torch.no_grad():
            modify_in_place(important_parameter)

But I’m seeing the same problem

Thank you for your reply.

Notice that in this new version I had to do learn.model.module[17], because nn.DataParallel apparently adds a “module” and the model is within that… but I’m somewhat confused.

mraggi · October 5, 2019, 4:10am

Sorry, I found out my error, it had nothing to do with distributed and everything to do with the fact that I was using fp16.