I have built a text classifier using the approach taken for IMDB in the MOOC. I updated fastai today and found some changes to
language_model_learner. If I understood correctly the API now requires
arch to be defined which then (for
AWD_LSTM at least) pulls in pre-trained weights as default.
I updated my code to reflect the change:
learn_lm = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)
learn.lr_find seemed to give similar results to that I got in the past.
However, when I start to train (I have tried different lr) I don’t seem to get any improvement in accuracy at all. Previously, I could train final layers (i.e. prior to unfreezing) for around 15 epochs and see significant improvement.
learn_lm.fit_one_cycle(5, 1e-1, moms=(0.8,0.7))
Has anyone else seen issues with this latest release ? Am I missing something in these changes ? The predictions I am getting via the language model are now quite poor whereas I was getting sensible output (in context of my data) with the previous release.
Your code is correct for the new API. Are you using the same batch size / learning rate as before? Asking because 1e-1 seems very big.
Thanks for responding. I have not changed batch size (it is default). I am not sure if the learning rate got changed but it doesn’t seem to make much difference what I use now.
Here, for example, with 1e-3:
Now that is weird because even the IMDB sample example trains better than this when unfrozen. Are you sure you use the default tokenization of fastai? Where using URLs.WT103_1 before?
Yeah I’m trying to run the same camvid stuff I was running yesterday and something is… different. When I train accuracy is near zero and doesn’t improve. Tried zapping everything and rerunning notebook from scratch but the same thing is happening.
Yes, I am using default tokenization with
TextLMDataBunch.from_csv and was previously using
I am running my notebook on Windows (and was before too).
I am also having doubts on both learner accuracy and beam_search prediction. This is what I get on IMDB after unfreeze (using train and test folders, 5e-3 LR). The beam_search prediction was kind of repetitive (see ex. https://www.kaggle.com/abedkhooli/textgen-fastai143-2). This kernel was run before 1.0.43 was released (dev version) but I believe the issue is still the same. I am currently running same IMDB with 1.0.37 to double check.
|epoch |train_loss |valid_loss |accuracy|
|1 |4.890095 |4.716460 |0.252501|
|2 |4.896220 |4.716479 |0.252500|
|3 |4.874429 |4.716483 |0.252489|
|4 |4.881887 |4.716461 |0.252492|
|5 |4.869444 |4.716483 |0.252494|
Let’s begin simple. Can you train the text example to similar accuracy/loss?
I’m assuming you mean the example on https://docs.fast.ai/text.html.
If so, I will try this and see what I get.
I actually meant the notebook in the example fodler but the notebook behind this page works too.
We can reproduce this one and are trying to find the reason.
Ok, there was a problem in
lr_find that has been fixed in the hotfix 1.0.43.post1, just update fastai again and you should be good.
Sorry about that.
what is the new link to URLs.WT103_1? when change i code to arch=URLs.WT103_1, it does not work.
I see accuracy is at 0.25 levels with AWD_LSTM vs 0.34 with RLs.WT103_1( in note book).
@Chandrak , if I understand the code correctly, when
arch=AWD_LSTM is specified the URL for the pretrained model (i.e.
URLs.WT103_1) is picked up from metadata settings specified for the model.
OK great thanks, the language model now seems to be fine tuning much as it was before. I have a separate issue with
TextClasDataBunch.from_csv but need to play around with that a bit more.
If you are referring to the values in my reply above, that’s b/c 1.0.43 release had an issue that @sgugger just fixed (hotfix 1.0.43.post1) and I am not including the ‘unsup’ folder (just to speed up training and fit corpus in available gpu ram). For the case I referenced above, accuracy after unfreezing starts around 0.295 in first epoch which sounds reasonable (tested with 1.0.37, did not try 1.0.43post1 yet).
Thanks for posting this @jbuzza!! I think I had the same problem with a vision dataset and was breaking my head yesterday to figure out what I had done wrong. It a greater relief to see that it has been solved.
Thanks @sgugger. Did I understand correctly that some issue with
fit_one_cycle to behave differently?
The issue was actually in
load (specifically the purge part that is there to free some GPU memory).