Obtain parameter values at training time using Hook Callback

Flavio · February 4, 2020, 8:22am

I am interested to see the evolution of the parameter values during training.
Probably, using SaveModelCallback with the option every=‘epoch’ (as described in the Help page callbacks Tracking) would be one possible solution.
However, I would rather use the Hook callback.
As Jeremy and Sylvain suggested in different posts, the starting point is ActivationStats in Help page callbacks HookCallback. So I went to the source code of this class ActivationStats and I modified the definition to get back the modules instead of the mean & std of the vectors/tensors.

Here below you see the code for my “SaveModules”: modified only in two places with respect to “ActivationStats”, clearly marked in the code below.

class SaveModules(HookCallback):
    "Callback that record the mean and std of activations."

    def on_train_begin(self, **kwargs):
        "Initialize stats."
        super().on_train_begin(**kwargs)
        self.stats = []

    def hook(self, m:nn.Module, i:Tensors, o:Tensors)->Tuple[Rank0Tensor,Rank0Tensor]:
        "Take the mean and std of `o`."
        return m #            <==== I get now only m
    def on_batch_end(self, train, **kwargs):
        "Take the stored results and puts it in `self.stats`"
        if train: self.stats.append(self.hooks.stored)
    def on_train_end(self, **kwargs):
        "Polish the final result."
        super().on_train_end(**kwargs)
        #self.stats = tensor(self.stats).permute(2,1,0) #  <==== commented out to avoid error

The rest goes like for the example in the Help page callbacks HookCallback.

path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)
learn = Learner(data, simple_cnn((3,16,16,2)), callback_fns=SaveModules)
learn.fit(1)

I look at the data from the hook:

learn.save_modules.stats[nrOfBatch][ModuleNr].weight[y,x,kernelx,kernely]

For example, I take the data of the first batch, module 1, and zeroth kernelx and kernely:
learn.save_modules.stats[0][1].weight[:,:,0,0]

Disappointingly, the values in the hundredth batch are identical:

learn.save_modules.stats[100][1].weight[:,:,0,0],

as if there was no improvement in the parameters during the training phase. Why is that?
We can also see the problem from another side: what is the point in saving hundreds of times the very same parameter set?
I’m afraid I misunderstood the Hook story, at some place…

Any help would be very welcome!
Thanks in advance

TomB · February 4, 2020, 10:27am

Hooks are used to access activations not parameters (activations being the outputs of layers, the result of applying the parameters to the inputs). If you want to track the parameters of the model then you can just directly access them on the model and don’t need a hook. For instance with simple_cnn you can get the weights of the first convolutional layer with something like learn.model[0].weight.data. You could access these in a callback to record them throughout training (but note that they will be on the GPU if using one, so be careful not to run out of memory keeping copies, hooks move activations to the CPU for you).

Flavio · February 14, 2020, 11:00am

Dear TomB, thanks for your very helpful reply. I have waited some more time because I wanted to have a working version of your suggestions.

I got some inspiration also from https://forums.fast.ai/t/callbacks-in-fast-ai/31655, https://forums.fast.ai/t/help-understanding-and-writing-custom-callbacks/28762 and https://github.com/sgugger/Deep-Learning/blob/master/Using%20the%20callback%20system%20in%20fastai.ipynb (unfortunately not working on my computer).

So now I got what I need if I use the following code:

@dataclass
class MyCallback(Callback):
    def __init__(self, learn:Learner):
        super().__init__()
        self.imparo = learn

    def on_batch_begin(self, num_batch, **kwargs):
        fileName = f'file_{str(num_batch)}.pt'
        torch.save(self.imparo.model[1][0].weight.data, fileName)

path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)
learn = Learner(data, simple_cnn((3,16,16,2)), callback_fns=MyCallback)
learn.fit(1)

I get file_0.pt, file_1.pt, etc. with the weights of the the second cnn layer just before the first batch, the second batch, etc. In fact learn.model[1][0] corresponds to Conv2d(16, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)).

This is very good because it offers a solution to my problem. I am nevertheless looking for a solution that stores the results in some variable I can use directly in the notebook. Something llike:

@dataclass
class MyCallback(Callback):
    def __init__(self, learn:Learner):
        super().__init__()
        self.imparo = learn
        self.stuff = []

    def on_batch_begin(self, num_batch, **kwargs):
        self.stuff.append(self.imparo.model[1][0].weight.data)

In this case, I attach all the weights from the different batches to the variable / attribute stuff. Unfortunaly, I cannot find it.
I have already searched inside learn but without success.

Am I overlooking something?
Thanks a lot!