Understanding bptt, batchsize, and what is getting fed into the forward function per mini-batch?

Even · February 2, 2018, 5:32pm

It’s not quite the same thing. RNNs can process variable length inputs because they’re fed a sequence and their output is a sequence. You feed it one character/word/??? at a time and at every iteration you get an output, whether you chose to look at it or not. This is true for all RNN models.

For language modelling we’re trying to learn to predict the next char/word based on the previous sequence. In order to make that more robust and to help prevent overfitting the sequence length is varied randomly so that instead of always learning to predict the 11th word from a sequence of 10 words it splits up the data into sequences of a length that’s random, but normally distributed around the bppt hyperparameter.

Hopefully that helps a little. I recommend watching a few more videos on RNNs and rewatching the related lessons. It’s a complex topic that took me a number of tries to understand and I’m still learning it.