OK, I was going through the FASTai code for AWD-LSTM as described in notebook 12a_awd_lstm.
The forward function is written something like this:
if batch_size != self.batch_size:
self.batch_size = batch_size
self.reset()
Where the function self.reset
creates new hidden and cell states according to the new batch_size initialized at zero.
def _one_hidden(self, l):
"Return one hidden state."
nh = self.n_hid if l != self.n_layers - 1 else self.emb_sz
return next(self.parameters()).new(1, self.bs, nh).zero_()
def reset(self):
"Reset the hidden states."
self.hidden = [(self._one_hidden(l), self._one_hidden(l)) for l in range(self.n_layers)]
Now the batch_size is initially set as 1 and as such the code will call self.reset
during the first batch (as long as someone doesn’t train it for batch_size=1).
The last batch is probably not going to be equal to other batches. As such it would again call self.reset
during the last batch.
Now for all other batches, it is trained in such a way that the hidden state is shared across all batches. This is not the case for the last batch, am I reading the code right?