ActivationStats in Language Model just records statistics for the last layer

arnau · September 24, 2020, 2:38pm

Hi everyone,

Does anyone know why if I train my Language Model using the following code:

learner = language_model_learner(text_data, AWD_LSTM, cbs=[ActivationStats(with_hist=True)])

learner.fit(1)

The following line of code:

learner.activation_stats.stats[0]

results in:

[None,None,None,None,None,None,None,None,{'mean': -1.8462823629379272, 'std': 1.2596983909606934, 'near_zero': 0.9194439246831189, 'hist': tensor([1.2411e+06, 1.0021e+06, 8.0230e+05, 6.4066e+05, 5.0985e+05, 4.0494e+05,
        3.2018e+05, 2.5357e+05, 2.0129e+05, 1.5805e+05, 1.2631e+05, 1.0001e+05,
        8.0982e+04, 6.4612e+04, 5.2268e+04, 4.2857e+04, 3.4609e+04, 2.8632e+04,
        2.3751e+04, 1.9741e+04, 1.6459e+04, 1.3934e+04, 1.1901e+04, 1.0163e+04,
        8.5460e+03, 7.2550e+03, 6.3900e+03, 5.4860e+03, 4.8030e+03, 4.0000e+03,
        3.4060e+03, 2.9320e+03, 2.6210e+03, 2.2900e+03, 1.9420e+03, 1.6760e+03,
        1.4160e+03, 1.2110e+03, 1.0560e+03, 8.4500e+02])}]

Why do the first 8 layers of the model get no statistics at all (they are simply None)??? Is it because they are Embeddings and RNN’s???

Thanks a lot!