Problem in understanding BPTT

(Shivansh Mishra) #1

Hi so i was trying to do imdb sentiment analysis . The problem that occurred to me was that when we divide the string into 64 equal batches and then we transpose it . hence the matrix becomes 1million * 64 now if bptt is 70 then it will take the 70 rows . So now the matrix becomes 70 * 64 .
Now I am unable to think how is the data fed into the neural net if
a) data is being fed in this way then each row goes through the dense layer of size 64 but the problem that will occur is that the row doesn’t contain the related words .
b) column is fed into the dense layer but the problem here is that the size of the column keeps changing all thanks to pytorch . so if the column is changing then how will we set the weights of the neural network’s layer .
Please help


P.S. above is the screenshot of the problem .

0 Likes

(魏璎珞) #2

I think he explained it again in Lesson 7 here:

Maybe an example:

  1. original text (one long string of 100 words)
It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife However little known the feelings or views of such a man may be on his first entering a neighbourhood this truth is so well fixed in the minds of the surrounding families that he is considered the rightful property of some one or other of their daughters My dear Mr Bennet said his lady to him one day have you heard that Netherfield Park is let at last Mr Bennet replied that he had not But it 
  1. split into batchsize of 5 (5 shorter strings with length 20)
i) It is a truth universally acknowledged that a single man in possession of a good fortune must be in want

ii) of a wife However little known the feelings or views of such a man may be on his first entering

iii) a neighbourhood this truth is so well fixed in the minds of the surrounding families that he is considered the

iv) rightful property of some one or other of their daughters My dear Mr Bennet said his lady to him one

v) day have you heard that Netherfield Park is let at last Mr Bennet replied that he had not But it 
  1. split again into sequence length of 5. So ther will be 4 blocks, each block will be 5 by 5
It is a truth universally       acknowledged that a single man     in possession of a good               fortune must be in want
of a wife However little        known the feelings or views        of such a man may                     be on his first entering
a neighbourhood this truth is   so well fixed in the               minds of the surrounding families     that he is considered the
rightful property of some one   or other of their daughters        My dear Mr Bennet said                his lady to him one
day have you heard that         Netherfield Park is let at         last Mr Bennet replied that           he had not But it 

So in batch 1, the RNN will receive the 1st word (of batch size = 5)
It of a rightful day
followed by the 2nd word
is a neighbourhood property have
then 3rd word
a wife this of you
then 4th word
truth However truth some heard
and finally
universally little is one that
If your bptt was set longer, say 10, just keep feeding 5 more words into the RNN

0 Likes

(Shivansh Mishra) #3

@wyquek I think you missed the part of transposing after step two please look into this.

0 Likes

(魏璎珞) #4

transposing aside, does it help to answer a) and b)? i tried my best to explain it as i understood from the lecture

0 Likes

(Shivansh Mishra) #5

@wyquek but here instead of “It of a rightful day” shouldn’t we feed “It is a truth universally” because this is interrelated .??
please help .

0 Likes

(魏璎珞) #6

don’t worry about this first. in lesson 6 the reason will become clear why this crazy way of arranging the text is the most efficient. it has to do with stateful rnn. got me confused when i did the imdb lesson too. jeremy will pop open the hood and help u look inside in lesson 6

0 Likes

(魏璎珞) #7

Think of It of a rightful day as 5 different images in a minibatch, fed in as a batchsize of 5. is a neighbourhood property have is the next minibatch of 5 images. So It is not really related with of.

Think of It as the first image frame in a video, and is the next image frame in the video, and a the 3rd frame in the video.

1 Like

(Shivansh Mishra) #8

thank you i’ll be looking into chapter 6 again I think i missed something .:grinning:

0 Likes

(Shivansh Mishra) #9

thanks a lot now i under stand it i think .
What i have learned is that first of all the embeddings of the it are passed to the neural net and the outcome expected is the embeddings of is if they do not occur then we adjust the weights necessary for the result then of is passed and we expect the outcome to be a . Hope my understanding is right can you please confirm it
@wyquek

0 Likes

(魏璎珞) #10

Very close, but just slightly off. The weighs are adjusted after the entire block below passed through the RNN

It is a truth universally
of a wife However little
a neighbourhood this truth is
rightful property of some one
day have you heard that

Then with the new weights, the next block below is passed through the RNN

acknowledged that a single man 
known the feelings or views
so well fixed in the
or other of their daughters
Netherfield Park is let at 

It is used to predict the outcome is
of is also used to predict a
a used to predict neighbourhood all at the same time as they are in a batch.
It’s a bit of a mindbender, I know

1 Like

(Shivansh Mishra) #11

@wyquek thanks a ton for explaining me thank you very much now i understand it .

0 Likes

#12

So it means that It of a rightful day it’s X and is a neighbourhood property have it’s Y (target value) for network?

0 Likes

(魏璎珞) #13

in retrospect, not the best way to explain rnn :frowning: Jeremy’s way is way clearer.
This might be helpful: training the rnn using one target

and this less wasteful way (the way you’ve described), where one trains the rnn with many targets

0 Likes