Lesson 8 - Official topic

In Chapter 12 we create an LSTMCell from scratch as such:

class LSTMCell(Module):
    def __init__(self, ni, nh):
        self.ih = nn.Linear(ni,4*nh)
        self.hh = nn.Linear(nh,4*nh)

    def forward(self, input, state):
        h,c = state
        #One big multiplication for all the gates is better than 4 smaller ones
        gates = (self.ih(input) + self.hh(h)).chunk(4, 1)
        ingate,forgetgate,outgate = map(torch.sigmoid, gates[:3])
        cellgate = gates[3].tanh()

        c = (forgetgate*c) + (ingate*cellgate)
        h = outgate * c.tanh()
        return h, (h,c)

How do I create a single layer LSTM model to use this cell? This isn’t implemented in the notebook. I have tried it on my own but I’m getting an error RuntimeError: size mismatch, m1: [1024 x 64], m2: [2 x 256] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41 and I’m not sure what the mistake is. Here’s the code for the model and training:

class LMModel6(Module):
    def __init__(self, vocab_sz, n_hidden, n_layers):
        self.i_h = nn.Embedding(vocab_sz, n_hidden)
        self.lstm = LSTMCell(n_layers, n_hidden)
        self.h_o = nn.Linear(n_hidden, vocab_sz)
        self.h = torch.zeros(n_layers, bs, n_hidden)
        
    def forward(self, x):
        h, res = self.lstm(self.i_h(x), self.h)
        self.h = h.detach()
        return self.h_o(res)
    
    def reset(self):
      for h in self.h: h.zero_()
learn = Learner(dls, LMModel6(len(vocab), 64, 2), 
                loss_func=CrossEntropyLossFlat(), 
                metrics=accuracy, cbs=ModelReseter)
learn.fit_one_cycle(15, 1e-2)

Any idea how to fix this?

Can you share output of learn.summary?

Learn.summary won’t work since there is a runtime error.

But you should be able to see the inputs and outputs and then dwelve into the code to check. This is just a suggestion.

Good morning catanza hope you are having a wonderful day.

I am receiving the exact error!

I use the following to repositories at the start of my notebook as this is what, has worked for me.

https://raw.githubusercontent.com/WittmannF/course-v4/master/utils/colab_utils.py

from colab_utils import setup_fastai_colab setup_fastai_colab()

I had to install !pip install sentencepiece this morning and have had the error since then, it is my first attempt at NLP so am a little stuck :confused:.

Have you managed to find a resolution for this error?

Kind regards mrfabulous1 :grinning: :grinning:

HI jcatanza hope your having a Fun day!

I was trying to resolve this problem on another thread!

The above solution worked for me if you still have the issue!

Cheers mrfabulous! :grinning: :grinning:

1 Like

Thanks @mrfabulous1 I installed an earlier version of sentencepiece
% pip install sentencepiece==0.1.86, and now there is no problem.

2 Likes

@Salazar we’ll start next week May 26th Tuesday, 6-9pm PST. We will silently work on notebooks, then discuss as a group. You don’t have to complete anything in advance. URL to join next week’s meeting: https://meet.google.com/hgw-itjd-hep. Hope to see some of you there!

2 Likes

I am using paper space and I get the following error when running this line of code in notebook 10. I am not sure why it is looking for a pkl file.

dls_lm = DataBlock(
blocks=TextBlock.from_folder(path, is_lm=True),
get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

It’s due to the tokenized text information being saved away. This may be a similar issue to what has been going on with the old course and paperspace, where the system path would give issues. A better place for this may be on the paperspace platform discussion? (Where I know mods from paperspace are on often):

Thanks. I will post there.

Is this working for you? I haven’t been able to get it to work in fastai2

How are you passing it in? On Learner or on your call to fit?

Does Fastai2 support multi-gpu training?

Yes it does. Check out the distributed docs: https://dev.fast.ai/distributed @harish3110

1 Like

Finally got around catching up with this lesson. :sweat_smile:

Thank you again @jeremy @rachel @sgugger, for making this edition possible despite the unexpected covid-19 happenings all around the world. I personally have had a really hard time concentrating during the last couple of months, I cannot even imagine how you’ve kept this going on without hiccups while taking on more challenges on the side. Hats off and much respect to the fastai team. :bowing_man:

Looking forward to making use of :fast_forward::robot::two:(fastai2 :grin:) during the upcoming months, and hopefully join again for next part.

3 Likes

I’ve a question on the language model from scratch. At around 59:50, jeremy mentions that we can concatenate the words together separated by a full stop. ([‘one \n’, ‘two \n’…] -> ‘one . two . …’)
in the 12_nlp_dive notebook, right after reading in the data.

  • Why is the joiner character (.) necessary ?
  • Can’t we just use a space instead, since space doesn’t really represent anything else in this dataset ?
  • Is it only for this particular dataset, or would we introduce a joiner for language sentences as well, when training from scratch ? I’d assume not.

I’m working on a lang. model from scratch for a non language text dataset, therefore trying to understand the significance of why we add a joiner, and if it might be relevant to what I’m doing.

The . is the token to split the written numbers and you are right, you could also choose a space or any other token thats not used in the dataset.

This special token is only necessary in this dataset- but xxbos xxmaj etc… for example are special tokens used in „real“ text datasets (begin of sentence or uppercase letter).

I would call the . token a Splitter not a joiner. If you tokenize your data on a „sub word“ level you might need a similar token to tell that a new „word“ begins.

Hope that helps.

1 Like

Thanks for the reply.

After thinking about this for a while, I realize that the . is used here to represent the end of a ‘sentence’(only a word in this case), hence representing that idea of a pause from one thing to another.

Has anyone worked on or read about abstractive summarization in NLP using Fastai or PyTorch? Any resources i.e. papers, blog posts etc. would also be helpful! :slight_smile: