Changes in language_model_learner (release 1.0.43)

I have built a text classifier using the approach taken for IMDB in the MOOC. I updated fastai today and found some changes to language_model_learner. If I understood correctly the API now requires arch to be defined which then (for AWD_LSTM at least) pulls in pre-trained weights as default.

I updated my code to reflect the change:

learn_lm = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)

Running learn.lr_find seemed to give similar results to that I got in the past.


However, when I start to train (I have tried different lr) I don’t seem to get any improvement in accuracy at all. Previously, I could train final layers (i.e. prior to unfreezing) for around 15 epochs and see significant improvement.

learn_lm.fit_one_cycle(5, 1e-1, moms=(0.8,0.7))


Has anyone else seen issues with this latest release ? Am I missing something in these changes ? The predictions I am getting via the language model are now quite poor whereas I was getting sensible output (in context of my data) with the previous release.

1 Like

Your code is correct for the new API. Are you using the same batch size / learning rate as before? Asking because 1e-1 seems very big.

Thanks for responding. I have not changed batch size (it is default). I am not sure if the learning rate got changed but it doesn’t seem to make much difference what I use now.

Here, for example, with 1e-3:


Now that is weird because even the IMDB sample example trains better than this when unfrozen. Are you sure you use the default tokenization of fastai? Where using URLs.WT103_1 before?

Yeah I’m trying to run the same camvid stuff I was running yesterday and something is… different. When I train accuracy is near zero and doesn’t improve. Tried zapping everything and rerunning notebook from scratch but the same thing is happening.

Yes, I am using default tokenization with TextLMDataBunch.from_csv and was previously using pretrained_model=URLs.WT103_1 with language_model_learner.

I am running my notebook on Windows (and was before too).

I am also having doubts on both learner accuracy and beam_search prediction. This is what I get on IMDB after unfreeze (using train and test folders, 5e-3 LR). The beam_search prediction was kind of repetitive (see ex. This kernel was run before 1.0.43 was released (dev version) but I believe the issue is still the same. I am currently running same IMDB with 1.0.37 to double check.

|epoch |train_loss |valid_loss |accuracy|
|1 |4.890095 |4.716460 |0.252501|
|2 |4.896220 |4.716479 |0.252500|
|3 |4.874429 |4.716483 |0.252489|
|4 |4.881887 |4.716461 |0.252492|
|5 |4.869444 |4.716483 |0.252494|

Let’s begin simple. Can you train the text example to similar accuracy/loss?

I’m assuming you mean the example on

If so, I will try this and see what I get.

I actually meant the notebook in the example fodler but the notebook behind this page works too.

We can reproduce this one and are trying to find the reason.

1 Like

Ok, there was a problem in lr_find that has been fixed in the hotfix 1.0.43.post1, just update fastai again and you should be good.

Sorry about that.


what is the new link to URLs.WT103_1? when change i code to arch=URLs.WT103_1, it does not work.
I see accuracy is at 0.25 levels with AWD_LSTM vs 0.34 with RLs.WT103_1( in note book).

@Chandrak , if I understand the code correctly, when arch=AWD_LSTM is specified the URL for the pretrained model (i.e. URLs.WT103_1) is picked up from metadata settings specified for the model.

OK great thanks, the language model now seems to be fine tuning much as it was before. I have a separate issue with TextClasDataBunch.from_csv but need to play around with that a bit more.

If you are referring to the values in my reply above, that’s b/c 1.0.43 release had an issue that @sgugger just fixed (hotfix 1.0.43.post1) and I am not including the ‘unsup’ folder (just to speed up training and fit corpus in available gpu ram). For the case I referenced above, accuracy after unfreezing starts around 0.295 in first epoch which sounds reasonable (tested with 1.0.37, did not try 1.0.43post1 yet).

Thanks for posting this @jbuzza!! I think I had the same problem with a vision dataset and was breaking my head yesterday to figure out what I had done wrong. It a greater relief to see that it has been solved.

Thanks @sgugger. Did I understand correctly that some issue with lr_find caused fit_one_cycle to behave differently?

The issue was actually in load (specifically the purge part that is there to free some GPU memory).

Thank you @sgugger :slight_smile: