I have a small question on RNN but couldn’t figure the answer.
In the video, we have tried two approaches, 1) create RNN with hidden state reset in every epoch and 2) have hidden state saved as a private variable (and then get updated in every epoch).
So my question is, in RNN what are the main parameters we want to train? the weights inside the RNN network or/ and the hidden state? Which one is more important?
The hidden state isn’t trained like the weights are. It’s a thing that temporarily keeps track of what is going on. But the model doesn’t actually store the hidden state anywhere when it is saved to a file, only the weights. Any time you have a new sequence, the hidden state is reset to all zeros.
Thanks for your prompt reply. But if the hidden state are not trained at all, what’s the value / information gained by reusing the same hidden state in every epoch?
Are we saying by reusing the same hidden state, this could allows the weights to be trained better?
When I said it is not trained, what I meant it isn’t trained using SGD like the rest of the model. But the hidden state does get updated during the training process: it is updated by the output from the forward pass.
Each RNN/LSTM/GRU unit takes the input and the current hidden state, and computes an output and the new hidden state. So the hidden state remembers things about what it has seen so far.
Exactly what it remembers depends on the input but also on the weights of the RNN/LSTM/GRU unit. (So initially, what the hidden state remembers is pretty meaningless. It only becomes more useful as the model actually starts learning the weights.)
Without the hidden state, the RNN/LSTM/GRU units wouldn’t know anything about the inputs that came before, and so they wouldn’t be able to learn anything about the sequence.