Training an LSTM model issues

tcapelle · April 5, 2020, 7:35pm

I am trying to train a Conv Lstm model with the fastai API.
After starting training I get the error:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

Do I need to do something special with the LSTM layer to be able to train it? A Callback? to reset the hidden state on each loop?
I am very new to LSTM and recurrent nets in general.
btw, this is my model:

class ConvLSTM(Module):
    def __init__(
        self, future_steps, latent_dim=512, lstm_layers=1, hidden_dim=1024, bidirectional=True, attention=True
    ):
        self.encoder = Encoder(3, latent_dim)
        self.lstm = LSTM(latent_dim, lstm_layers, hidden_dim, bidirectional)
        self.output_layers = nn.Sequential(
            nn.Linear(2 * hidden_dim if bidirectional else hidden_dim, hidden_dim),
            nn.BatchNorm1d(hidden_dim, momentum=0.01),
            nn.ReLU(),
            nn.Linear(hidden_dim, future_steps))
        self.attention = attention
        self.attention_layer = nn.Linear(2 * hidden_dim if bidirectional else hidden_dim, 1)
    
    def reset(self): self.lstm.reset()
    def forward(self, x):
        batch_size, seq_length, c, h, w = x.shape
        x = x.view(batch_size * seq_length, c, h, w)
        x = self.encoder(x)
        x = x.view(batch_size, seq_length, -1)
        x = self.lstm(x)
        if self.attention:
            attention_w = F.softmax(self.attention_layer(x).squeeze(-1), dim=-1)
            x = torch.sum(attention_w.unsqueeze(-1) * x, dim=1)
        else:
            x = x[:, -1]
        return self.output_layers(x)

I am encoding frames of a video, with the Encoder, and passing this over the LSTM layer.

tcapelle · April 9, 2020, 8:26am

I will answer my own qeustion:

You need to detach the hidden state from the graph to be able to train the LSTM:

class LSTM(Module):
    def __init__(self, latent_dim, num_layers, hidden_dim, bidirectional=False):
        self.lstm = nn.LSTM(latent_dim, hidden_dim, num_layers, batch_first=True, bidirectional=bidirectional)
        self.h = None
        
    def reset(self): 
        self.h = None

    def forward(self, x):
        x, self.h = self.lstm(x, self.h)
        self.h = [_.detach() for _ in self.h]
        return x

You also need to add cbs=[ModelReseter()] to the learner, to reset the hidden state before each epoch.

I have another questio on zeroing the hidden state before each epoch.
Some code do None re initialization and other do self.h = [torch.zeros(2, bs, n_hidden) for _ in range(n_layers)]
@jeremy does the second one on the 12_nlp_dive.ipynb notebook from the fastbook (kudos for this awesome resource)
Do both are equivalent?
I am mostly interested because using None does not need to hardcode the batch_size

tcapelle · April 10, 2020, 2:54pm

This is my model @takotab