I am not able to understand how the language model network is trained on variable sequence length. As the output of the line
is a tensor of different length, some times 63X64 Matrix and sometimes 72X64. So sometimes I am getting sentence of length 63 and sometimes 72. Are these tensors padded to make them of equal length before they are fed to the neural network?
The sizes varies a lot when me move to predicting sentiment.
While predicting sentiment of movie review, How is the network adjusted to expect a larger input? I was under the impression that the network will expect an input of length=bptt, but movie reviews are much longer.
I have come from keras background and its difficult for me to understand how the input length is getting varied between the two tasks.
For example, In keras If I create a network and train it for language model, then input length of 1 row would be equal to bptt size. However, I can’t now use the same network for an input of different length say movie review which would be bigger then bptt size.
Is this feature specific to Pytorch/fastai