Help constructing language model from custom pytorch nn.Module

Hi folks! I’m having trouble creating a language model from scratch in pytorch and then creating a fastai LanguageModel from it. Here’s a notebook showing how I’m arriving at the error.

In short, I’m creating this pytorch module:

class BasicLanguageModel(nn.Module):
     
    def __init__(self):
        super().__init__()
        self.i_h = nn.Embedding(nv, nh)
        self.h_h = nn.Linear(nh, nh)
        self.h_o = nn.Linear(nh, nv)
        self.bn = nn.BatchNorm1d(nh)
        self.reset()
        
    def forward(self, x):
        print("input size: ", x.size())
        res = []
        h = self.h
        for i in range(x.shape[1]):
            h = h + self.i_h(x[:,i]) 
            h = F.relu(self.h_h(h))
            res.append(self.bn(h))
        print("hidden layer size: ", h.size())
        print("res size: ", res[0].size())
        self.h = h.detach()
        res = torch.stack(res, dim=1)
        print("stacked res size: ", res.size())
        print("output size: ", self.h_o(res).size())
        return self.h_o(res)
    
    def reset(self):
        self.h = torch.zeros(bs, nh).cuda()

My data is created by downloading the freely-available text from H.G. Wells’ War of the Worlds (hence the repo name). I download the text and write it to a one-column CSV file, and then create a TextLMDataBunch:

data = TextLMDataBunch.from_csv('.', 'book_text.csv', text_cols=0)

Finally, I try creating and fitting a model:

learn = LanguageLearner(data, BasicLanguageModel(), metrics=accuracy)
learn.fit_one_cycle(10, max_lr=3e-2)

And I’m getting this error:

ValueError: Expected input batch_size (70) to match target batch_size (4480).

I’m not sure exactly what’s going on – feedback is appreciated!

One more update here as I tried another approach that still didn’t work (I’m definitely missing something here). Code here.

tl;dr: Instead of using LanguageLearner I just used Learner. This time, I was able to get the model to train…but when I called predict on the Learner, I got this error:

TypeError: only integer tensors of a single element can be converted to an index

So, I tried creating a LanguageLearner separately and loading the weights from the original model:

p = LanguageLearner(data, BasicLanguageModel(), metrics=accuracy)
learn.save("basic_model")
p.load("basic_model")

When I run

p.predict("The ", n_words=10)

I get this error:

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

My sense is that I probably need to dig through the fastai internals, but if anyone has had a similar issue or can point me in the right direction, it would be much appreciated!

Hi Rob, I used %debug after

ValueError: Expected input batch_size (70) to match target batch_size (4480).

It shows the target is 64*70 = 4480.

Then when I set_trace(), I see that you are producing output of 64* 70 * 3152.
Shouldn’t you produce output of 64*70 if we are training a language model?

Oh I forgot that we needs the probability?

Hi Bao Tin!

Thanks for taking a look! So I believe that 64 should refer to the batch size, 70 should refer to the time steps (i.e. the bptt setting), and 3152 should refer to the token probabilities.

It’s a good catch that 4480 = 64 * 70…I’m still trying to figure out why my target batch size is being interpreted as my input batch size times bptt.

Hi!

I think I figured out what was going on. After some digging, I realized a few things:

  • When writing raw pytorch, your model only knows how to deal with the batch size that you tell it to. At prediction time, you need to broadcast your prediction example so that it’s the same size as the batch you’ve defined in forward. In my model class, I literally added a function to repeat the prediction example batch_size times, and I was able to start making single predictions.
  • LanguageLearner has some useful arguments in its predict method, especially n_words, that tells the model how many predictions to make. I wanted to be able to use these when making predictions
  • When I changed it to a LanguageLearner and tried to start training, the model failed — it was erroring out almost immediately due to dimension mismatches (you can see the error in my post above)
  • I was able to use the predict behavior that I wanted in a plain Learner by setting learn.__class__ = LanguageLearner. Making predictions then worked as expected. I was still curious about why this was happening though, so I decided to dig a little deeper.
  • I stepped through the code and ultimately realized that the issue has to do with callbacks. The RNNTrainer callback (which is used by the LanguageLearner class) expects the output of forward to be a 3-element tuple of the form (predicted_output, hidden_output_with_dropout, hidden_output_no_dropout), or if you’re not using dropout (as in the case for my simple model), then the second two elements in that tuple are just the hidden output. As an example, you can see here that the AWD_LSTM outputs this format (for details of how LinearDecoder is used, see here). Once we get the dimensions set, everything works as expected, even when instantiating a LanguageLearner! This is quite a small/simple model, and I haven’t spent any time optimizing or incorporating more advanced tricks yet, but I’m glad I have a working end-to-end example that I can iterate further on. If you’re interested, you can see the solution in practice here.