Part 2 Lesson 10 wiki

  1. Noticed that @jeremy uses Mendeley to work with papers. Could you share any tips/workflows related to that?
  2. Regarding sub-words - does anyone tried to use phonetic transcription instead of (or in addition to) text?

Any time I find an interesting paper on my PC I save it to a folder. I have that folder set as a ‘watch folder’ in Mendeley and it auto-adds it to my library. If on my phone/tablet, I ‘share’ the PDF to Mendeley, which adds it to my library. I highlight interesting passages as I read. It’s all synced across my machines automatically.


Rewatching the videos and wondering if it would be beneficial to do something similar to t_up, but for the first letter of a word? I’m planning on trying it out unless somebody else already has and found it wasn’t helpful.

There’s some code commented out in Tokenizer that does that - I commented it out because it seemed a little complex and I wasn’t sure if it would help. If you try it, let us know if you get better results!


Am I completely mistaken in thinking that RNNs are not strictly limited to character/language per se. Therefore, conceptually any sequence of events with a preset “vocabulary” may be amenable to similar treatment. Cursory google searches shows ppl using LSTM for say time series prediction!

If anyone has experience in this domain, I’d love to chat with you.

If anybody else is having the annoying Can't find model 'en' issue, here is a link to last year where they fixed the issue: Lesson 4 - OSError: Can’t find model ‘en’. It showed up for me when I did the get_all line in imdb.


After the input and the first embedding layer, it’s all tensors…English text, french text, stock market data, video frames…it’s all just real numbers.
RNN doesn’t discriminate.


And the en refers to the small english model provided by spacy 2.0

Spacy also has medium and large models and they have excellent documentation talking about each of those English models.


Each epoch for the language model takes around 17minutes on the p3.2xlarge AWS instance (Volta V100 GPU).

Does it really take this long? Could anyone confirm?

Nice! Do you have a favorite paper/ref to read up more on this?

In the video (right here:, Jeremy says, “as per usual, we do a single epoch on the last layer”, which he explains are the embedding weights, and he does this because they’re the thing that will be the most wrong.
But the code, right before he runs the epoch, but he calls learner.unfreeze() just a few cells above his epoch run, which unfreezes all layers. I’m left to assume there’s some missing code that should be like learner.freeze(-1) or something right before he starts training the language model, no? Any insights here?


Models documentation:
Tokenizer documentation:
Language processing pipelines:


Do you mind to share?

I think so too…it should have been learner.freeze_to(-1) instead of unfreeze.
But going by the lr the lr_find found, it seems like it is not going to impact the weights if you unfreeze completely and train the whole LM.

Let me look at it carefully again…

1 Like

Yeah it takes forever. While I’m playing around with it, I just did

trn_dl = LanguageModelLoader(trn_tokens[:len(trn_tokens)//10], bs, bptt)
val_dl = LanguageModelLoader(val_tokens[:len(val_tokens)//10], bs, bptt)
md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)

Notice, I just took the first 10th of the trn and val. With the default Paperspace FastAI setup, it took 7 minutes to get through 1 epoch. Owch! Might need to upgrade machines…

can you please also add english subtitles to lesson 10 video…thanks

We sure may have to!

~36min/epoch on 1070 in a ubuntu box, it is slow indeed.

@vibhorsood Terminate the other terminals and remove the ‘unsup’ from CLASSES, it’ll reduce your training size from 100k to 50k but it’ll take care of the memory errors.

I am getting following error::

tok_trn, trn_labels = get_all(df_trn, 1)