OK, I was going through the FASTai code for AWD-LSTM as described in notebook 12a_awd_lstm.
The forward function is written something like this:
if batch_size != self.batch_size: self.batch_size = batch_size self.reset()
Where the function
self.reset creates new hidden and cell states according to the new batch_size initialized at zero.
def _one_hidden(self, l): "Return one hidden state." nh = self.n_hid if l != self.n_layers - 1 else self.emb_sz return next(self.parameters()).new(1, self.bs, nh).zero_() def reset(self): "Reset the hidden states." self.hidden = [(self._one_hidden(l), self._one_hidden(l)) for l in range(self.n_layers)]
Now the batch_size is initially set as 1 and as such the code will call
self.reset during the first batch (as long as someone doesn’t train it for batch_size=1).
The last batch is probably not going to be equal to other batches. As such it would again call
self.reset during the last batch.
Now for all other batches, it is trained in such a way that the hidden state is shared across all batches. This is not the case for the last batch, am I reading the code right?