Lesson 2: The LSTM, More on Tokenizers, Ensembling
Lesson 3: Other State-of-the-Art NLP Models
Lesson 4: Multi-Lingual Data, DeViSe
Closing notes
This will be my first time live-streaming so this will be an experiment for everyone but I have high hopes that this will turn out to be a successful study group with your help! Please use this thread for any questions and starting discussions about this material, we’re all learning fastai (and especially the second version) together! I will update this post with youtube links to the livestream, as well as post on this thread as well. Looking forward to seeing everyone next month!!!
(Also minor PSA, this is in no way for any credit whatsoever. I am just an undergraduate student wanting to help others learn how to use this amazing library to its fullest potential. Instead of worrying about credit, try using what you’ve learned into a project or two and some blogs, this provides evidence you know the material much better than a slip of paper can in some cases )
We’ll be covering a lot in the first lesson so we’ll mostly just be looking at the API and a hint on the major architecture fastai uses, and next lecture we’ll go full in-depth on an LSTM
I am trying to replicate this notebook on Kaggle Kernel on a different dataset, the TextDataLoaders generation tends to run on CPU even when GPU is enabled. Is the default setup as CPU for text api ?
Additional the kernel dies, because TextDataloader tries to use too much memory. Is there a way to limit memory and core usage in Fastai2?
Thanks a lot @muellerzr! It was again another awesome lecture!!
Some questions from my side:
What does seq_len mean? I have checked and the original movie reviews all have different lengths in words. How does the model handle it?
Where can we specify the vocab size in the code? In this example data_lm.vocab is 7080
I noticed nevertheless that when running data_lm.o2i.items() there are only 7050 items (not 7080!) the rest of the words are mapped to the unknown token. I find this very weird.
At some point we do learn.load_encoder('fine_tuned_enc'), what are we exactly re-using? The learned embeddings for our words or also some part of the LSTM that was used to predict the next word in the sentence?
Finally… how does the decoder look like? i.e. the model that does the actual classification
Great questions @mgloria I’ll be going much more in depth on Wednesday over these. I wanted to but didn’t have enough time to prepare for that.
We pad the sequences so they all fit It is indeed the 72, check the raw tensor values
For instance from our TextBlock we can specify it as max_vocab which defaults to 60000 (though it can be less). Not sure about the 30 less tokens.
Think of it as all but the last layer in our original language model, exactly the same way we transfer via our resnet34, all but that last layer
I’ll go into that a bit more this week, great question! The classifier is a PoolingLinearClassifier similar to our head from our vision models, but specific for language models
You can find it here:
And for the “how do we build the classifier itself” it’s full code is here:
You can see the encoder is our language model (arch) which we then want to load it’s weights in for and then the PoolingLinearClassifier is our “head”
I see…! This clarifies a few things. Thanks a lot @muellerzr!
How can check the raw tensor values? I was looking for something like dls.train_ds or similar but I cannot find how to get the actual data back. Moreover, it looking at show_batch seems to me as if the original reviews had been cut (maybe to be sequences of length 72), is this correct?
Regarding the max_vocab, I do not see where we are specifying it in the code. Shouldnt it take the default value then? i.e. 60000 terms
My best guess (before I have the chance to dig into the nb and supporting code) at the difference between the data_lm.vocab being 7080 and data_lm.o2i.items() being 7050 with the rest of the words mapped to the unk token is this: Typically when a word occurs less than a certain number of times (say 5 times) in the text then it is mapped to unk. That is probably what is going on here but will need to dig through nb+code to confirm.
I’m afraid there won’t be one, after starting this I realized my knowledge of NLP needs a bit more before I dive into this and my time constraints limit this. I’d recommend Rachel’s course instead, perhaps a few of you following this could combine your efforts to try to recreate the notebooks in fastai2 even! Apologies