# Problem in understanding BPTT

Hi so i was trying to do imdb sentiment analysis . The problem that occurred to me was that when we divide the string into 64 equal batches and then we transpose it . hence the matrix becomes 1million * 64 now if bptt is 70 then it will take the 70 rows . So now the matrix becomes 70 * 64 .
Now I am unable to think how is the data fed into the neural net if
a) data is being fed in this way then each row goes through the dense layer of size 64 but the problem that will occur is that the row doesnâ€™t contain the related words .
b) column is fed into the dense layer but the problem here is that the size of the column keeps changing all thanks to pytorch . so if the column is changing then how will we set the weights of the neural networkâ€™s layer .
Please help

P.S. above is the screenshot of the problem .

I think he explained it again in Lesson 7 here:

Maybe an example:

1. original text (one long string of 100 words)
``````It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife However little known the feelings or views of such a man may be on his first entering a neighbourhood this truth is so well fixed in the minds of the surrounding families that he is considered the rightful property of some one or other of their daughters My dear Mr Bennet said his lady to him one day have you heard that Netherfield Park is let at last Mr Bennet replied that he had not But it
``````
1. split into batchsize of 5 (5 shorter strings with length 20)
``````i) It is a truth universally acknowledged that a single man in possession of a good fortune must be in want

ii) of a wife However little known the feelings or views of such a man may be on his first entering

iii) a neighbourhood this truth is so well fixed in the minds of the surrounding families that he is considered the

iv) rightful property of some one or other of their daughters My dear Mr Bennet said his lady to him one

v) day have you heard that Netherfield Park is let at last Mr Bennet replied that he had not But it
``````
1. split again into sequence length of 5. So ther will be 4 blocks, each block will be 5 by 5
``````It is a truth universally       acknowledged that a single man     in possession of a good               fortune must be in want
of a wife However little        known the feelings or views        of such a man may                     be on his first entering
a neighbourhood this truth is   so well fixed in the               minds of the surrounding families     that he is considered the
rightful property of some one   or other of their daughters        My dear Mr Bennet said                his lady to him one
day have you heard that         Netherfield Park is let at         last Mr Bennet replied that           he had not But it

``````

So in batch 1, the RNN will receive the 1st word (of batch size = 5)
`It of a rightful day`
followed by the 2nd word
`is a neighbourhood property have`
then 3rd word
`a wife this of you`
then 4th word
`truth However truth some heard`
and finally
`universally little is one that`
If your bptt was set longer, say 10, just keep feeding 5 more words into the RNN

2 Likes

@wyquek I think you missed the part of transposing after step two please look into this.

transposing aside, does it help to answer a) and b)? i tried my best to explain it as i understood from the lecture

@wyquek but here instead of â€śIt of a rightful dayâ€ť shouldnâ€™t we feed â€śIt is a truth universallyâ€ť because this is interrelated .??
please help .

donâ€™t worry about this first. in lesson 6 the reason will become clear why this crazy way of arranging the text is the most efficient. it has to do with stateful rnn. got me confused when i did the imdb lesson too. jeremy will pop open the hood and help u look inside in lesson 6

Think of `It of a rightful day` as 5 different images in a minibatch, fed in as a batchsize of 5. `is a neighbourhood property have` is the next minibatch of 5 images. So `It` is not really related with `of`.

Think of `It` as the first image frame in a video, and `is` the next image frame in the video, and `a` the 3rd frame in the video.

1 Like

thank you iâ€™ll be looking into chapter 6 again I think i missed something .

thanks a lot now i under stand it i think .
What i have learned is that first of all the embeddings of the it are passed to the neural net and the outcome expected is the embeddings of is if they do not occur then we adjust the weights necessary for the result then of is passed and we expect the outcome to be a . Hope my understanding is right can you please confirm it
@wyquek

Very close, but just slightly off. The weighs are adjusted after the entire block below passed through the RNN

``````It is a truth universally
of a wife However little
a neighbourhood this truth is
rightful property of some one
day have you heard that
``````

Then with the new weights, the next block below is passed through the RNN

``````acknowledged that a single man
known the feelings or views
so well fixed in the
or other of their daughters
Netherfield Park is let at
``````

`It` is used to predict the outcome `is`
`of` is also used to predict `a`
`a` used to predict `neighbourhood` all at the same time as they are in a batch.
Itâ€™s a bit of a mindbender, I know

1 Like

@wyquek thanks a ton for explaining me thank you very much now i understand it .

So it means that `It of a rightful day` itâ€™s X and `is a neighbourhood property have` itâ€™s Y (target value) for network?

in retrospect, not the best way to explain rnn Jeremyâ€™s way is way clearer.
This might be helpful: training the rnn using one target

and this less wasteful way (the way youâ€™ve described), where one trains the rnn with many targets

Hi @wyquek, thanks a lot for your explanations, they helped me a lot so far.
However I am still confused about the inputs and target word.

So the language model is trained on e.g.

`'It of a rightful'` sequence to predict the word `'day'`
`'is a neighbourhood property'` to predict `'have'`

Is this correct?

Thanks a lot!

I believe itâ€™s the following 5 input words

``````It            of                 a                 rightful            day
``````

each predicting the following 5 next words

``````is            a          neighbourhood             property            have
``````

The video linksa bove have good explanations