I was going through the classes for LMModel and had few questions regarding the initialization of the hidden states .
## Simplified language model class LMModel4(Module): def __init__(self , vocab_sz , n_hidden): self.i_h = nn.Embedding(vocab_sz , n_hidden) self.h_h = nn.Linear(n_hidden , n_hidden) self.h_o = nn.Linear(n_hidden , vocab_sz) self.h = 0 def forward(self,input_x): # seq_length = 3 outs =  ## Repeat for 1->n-1 for i in range(seq_length): h = h + self.i_h(x[:,i]) h = self.h_h(h) # (bs,n_hidden) -> (bs,n_hidden) h = F.relu(h) # Activation Function outs.append(self.h_o(self.h)) ## Appending outputs at each layer # Note : We are Detaching gradients from h only after the computation of output , Which means that for this current iteration of out , gradients will still be calculated for the for seq_length layers . self.h = self.h.detach() ## Remove all the gradients associated with h and just keeps the value (Sort of acting like a fixed bias) after each seq_length return torch.stack(outs,dim=1) def reset(self): self.h = 0
## Multilayerd language model class LMModel5(Module): def __init__(self , vocab_sz , n_hidden, n_layers): self.i_h = nn.Embedding(vocab_sz , n_hidden) self.rnn = nn.RNN(n_hidden , n_hidden , n_layers , batch_first = True) self.h_o = nn.Linear(n_hidden , vocab_sz) self.h = torch.zeros(n_layers , bs , n_hidden) def forward(self,input_x): res,h = self.rnn(self.i_h(x) , self.h) self.h = h.detach() return self.h_o(res) def reset(self): self.h = 0
In LMModel4 , i understand the self.h is being set to 0 after every epoch (using callback) but within seq_length , the values are being used by the next layer .
In LMModel5 , i see self.h is defined with shape (n_layers , bs , n_hidden) . Does this mean values of each hidden_state for each layer and batch is stored here ?
Why is this needed ? Can’t we just use it like how we’ve been using it in LMModel4 by simply assigning h = 0 (with shape 1) . Is it because of the way nn.RNN functions ?