I am interested to see the evolution of the parameter values during training.
Probably, using SaveModelCallback
with the option every=‘epoch’ (as described in the Help page callbacks Tracking) would be one possible solution.
However, I would rather use the Hook callback.
As Jeremy and Sylvain suggested in different posts, the starting point is ActivationStats
in Help page callbacks HookCallback. So I went to the source code of this class ActivationStats and I modified the definition to get back the modules instead of the mean & std of the vectors/tensors.
Here below you see the code for my “SaveModules”: modified only in two places with respect to “ActivationStats”, clearly marked in the code below.
class SaveModules(HookCallback):
"Callback that record the mean and std of activations."
def on_train_begin(self, **kwargs):
"Initialize stats."
super().on_train_begin(**kwargs)
self.stats = []
def hook(self, m:nn.Module, i:Tensors, o:Tensors)->Tuple[Rank0Tensor,Rank0Tensor]:
"Take the mean and std of `o`."
return m # <==== I get now only m
def on_batch_end(self, train, **kwargs):
"Take the stored results and puts it in `self.stats`"
if train: self.stats.append(self.hooks.stored)
def on_train_end(self, **kwargs):
"Polish the final result."
super().on_train_end(**kwargs)
#self.stats = tensor(self.stats).permute(2,1,0) # <==== commented out to avoid error
The rest goes like for the example in the Help page callbacks HookCallback.
path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)
learn = Learner(data, simple_cnn((3,16,16,2)), callback_fns=SaveModules)
learn.fit(1)
I look at the data from the hook:
learn.save_modules.stats[nrOfBatch][ModuleNr].weight[y,x,kernelx,kernely]
For example, I take the data of the first batch, module 1, and zeroth kernelx and kernely:
learn.save_modules.stats[0][1].weight[:,:,0,0]
Disappointingly, the values in the hundredth batch are identical:
learn.save_modules.stats[100][1].weight[:,:,0,0]
,
as if there was no improvement in the parameters during the training phase. Why is that?
We can also see the problem from another side: what is the point in saving hundreds of times the very same parameter set?
I’m afraid I misunderstood the Hook story, at some place…
Any help would be very welcome!
Thanks in advance