Lm2_enc, missing steps?

In the lecture 10 IMDB notebook, it saves a model to lm1_enc, and then later tries to read a model from the file lm2_enc, which fails because the file does not exist.

Figuring it was a typo, I simply replaced the latter to read lm1_enc instead. That proceeds, but the resultng model overfits drastically to the training set.

So I’m wondering whether perhaps there are missing steps in the notebook, and in fact there needs to be code that generates lm2_enc and uses it?

1 Like

same here, would like to know this as well

I think Jeremy tried a few more models and deleted some of his attempts. I changed the lm2_enc to lm1_enc and it worked fine for me in my downstream classification task.

Incase your model is overfitting, you could maybe try some of the following things:

  1. Increase the fraction in the drops variable.
  2. Add gradient clipping in learner object. This proved useful to a bunch of us.
  3. Check if the language model is generating meaningful text by writing the inference piece - just to test if the LM is doing fine or not.
  4. Try incorporating cache pointer from AWD-LSTM. @sgugger found it super useful to his models (especially the LM). You might want to read this: https://sgugger.github.io/pointer-cache-for-language-model.html#pointer-cache-for-language-model

Does anyone have a cleaned up version of the imdb notebook that yields similarly high accuracies?

I noticed the same path issues as noted above, plus there are a couple of places where it looks like the Jeremy was trying different hyperparameter settings – not sure which ones yield the reported results. (Running the notebook as-is gave me accuracy=0.941)

1 Like

That won’t (AFAIK) help the classifier.

1 Like

You could check the ULMFiT paper for hyperparam details. Also @sebastianruder may have some more details on what worked well for IMDb there.

Thanks – I’m in the process of doing that, but wanted to see if anyone wanted to help me be lazy :slight_smile:

Yeah, I’m just working on simplifying the scripts in the imdb_scripts and making them easier to use. I’ll add a PR soon that makes it easier to pre-train a LM.

For the ablation experiments in the paper, we used --cl 50 and --lr 4e-3 for LM fine-tuning and --cl 50 for classifier fine-tuning.

1 Like

@sebastianruder, thanks, that’s the code I should be looking at anyway. My primary goal at first is to replicate Jeremy’s published results. I was trying to use the Lecture 10 notebook for that purpose, but I see now that the imdb_scripts are really the code to use.

I was going to build the full wikitext103 later and just confirm the results on the pre-built starting point, but it is more thorough to just go through the scripts and do it right.

Thanks for the additional detail on the parameters you based your publication on, and I eagerly look forward to your PR for any improvements.

The results you report are exciting indeed, and the reasons are more than the final classifier accuracy. I’m looking to replicate the final classifier training and observe the stated lack of historical overfitting, or catastrophic forgetting that the paper describes.

1 Like