Needed some help understanding Fastai implementation of AWD-LSTM

BK201 · September 8, 2020, 4:49am

OK, I was going through the FASTai code for AWD-LSTM as described in notebook 12a_awd_lstm.

The forward function is written something like this:

 if batch_size != self.batch_size:
     self.batch_size = batch_size
     self.reset()

Where the function self.reset creates new hidden and cell states according to the new batch_size initialized at zero.

def _one_hidden(self, l):
        "Return one hidden state."
        nh = self.n_hid if l != self.n_layers - 1 else self.emb_sz
        return next(self.parameters()).new(1, self.bs, nh).zero_()

def reset(self):
        "Reset the hidden states."
        self.hidden = [(self._one_hidden(l), self._one_hidden(l)) for l in range(self.n_layers)]

Now the batch_size is initially set as 1 and as such the code will call self.reset during the first batch (as long as someone doesn’t train it for batch_size=1).
The last batch is probably not going to be equal to other batches. As such it would again call self.reset during the last batch.

Now for all other batches, it is trained in such a way that the hidden state is shared across all batches. This is not the case for the last batch, am I reading the code right?