Language_model_learner not working as before?

Bliss · June 17, 2019, 9:24am

Hi,
I am re-running some code the used to work less than one month ago, but now I get an error.
Have not been able to find a solution… Any help will be welcome!

data_lm = (TextList.from_folder(PATH_POEMS)
           #Inputs: all the text files in path
           .split_by_rand_pct(0.1)
           #We randomly split and keep 10% (10,000 reviews) for validation
            .label_for_lm()
           #We want to do a language model so we label accordingly
            .databunch(bs=bs))

FILE_LM_ENCODER = '/home/jupyter/.fastai/data/p_gen/models/model-noqrnn'
FILE_ITOS = '/home/jupyter/.fastai/data/p_gen/models/itos_pretrained'

learn = language_model_learner(data_lm, AWD_LSTM,
                               pretrained_fnames=[FILE_LM_ENCODER, FILE_ITOS], 
                               drop_mult=0.3)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-1544d2335cf5> in <module>
      8                                pretrained_fnames=[FILE_LM_ENCODER, FILE_ITOS],
----> 9                                drop_mult=0.3)

/opt/anaconda3/lib/python3.7/site-packages/fastai/text/learner.py in language_model_learner(data, arch, config, drop_mult, pretrained, pretrained_fnames, **learn_kwargs)
    216             model_path = untar_data(meta[url] , data=False)
    217             fnames = [list(model_path.glob(f'*.{ext}'))[0] for ext in ['pth', 'pkl']]
--> 218         learn.load_pretrained(*fnames)
    219         learn.freeze()
    220     return learn

/opt/anaconda3/lib/python3.7/site-packages/fastai/text/learner.py in load_pretrained(self, wgts_fname, itos_fname, strict)
     78         if 'model' in wgts: wgts = wgts['model']
     79         wgts = convert_weights(wgts, old_stoi, self.data.train_ds.vocab.itos)
---> 80         self.model.load_state_dict(wgts, strict=strict)
     81 
     82     def get_preds(self, ds_type:DatasetType=DatasetType.Valid, with_loss:bool=False, n_batch:Optional[int]=None, pbar:Optional[PBar]=None,

/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
    775         if len(error_msgs) > 0:
    776             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 777                                self.__class__.__name__, "\n\t".join(error_msgs)))
    778         return _IncompatibleKeys(missing_keys, unexpected_keys)
    779 

RuntimeError: Error(s) in loading state_dict for SequentialRNN:
	size mismatch for 0.rnns.0.weight_hh_l0_raw: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
	size mismatch for 0.rnns.0.module.weight_ih_l0: copying a param with shape torch.Size([4600, 400]) from checkpoint, the shape in current model is torch.Size([4608, 400]).
	size mismatch for 0.rnns.0.module.weight_hh_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
	size mismatch for 0.rnns.0.module.bias_ih_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
	size mismatch for 0.rnns.0.module.bias_hh_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
	size mismatch for 0.rnns.1.weight_hh_l0_raw: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
	size mismatch for 0.rnns.1.module.weight_ih_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
	size mismatch for 0.rnns.1.module.weight_hh_l0: copying a param with shape torch.Size([4600, 1150]) from checkpoint, the shape in current model is torch.Size([4608, 1152]).
	size mismatch for 0.rnns.1.module.bias_ih_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
	size mismatch for 0.rnns.1.module.bias_hh_l0: copying a param with shape torch.Size([4600]) from checkpoint, the shape in current model is torch.Size([4608]).
	size mismatch for 0.rnns.2.module.weight_ih_l0: copying a param with shape torch.Size([1600, 1150]) from checkpoint, the shape in current model is torch.Size([1600, 1152]).

msrdinesh · June 17, 2019, 9:59am

Check whether the itos of data is same as the data with which you have fine-tuned language model. After training a language model, you have to save its itos, to use it on a different data.

code to save itos of old language data.

with open(MODEL_PATH+‘/itos.pkl’, ‘wb’) as f:
pickle.dump(data_lm.vocab.itos, f)

Bliss · June 17, 2019, 10:10am

Thanks for your reply.
However the issue is that I have not done (yet) any training
I mean, the first thing I am doing is loading these pretrained files… and that has worked many times in the past… until a couple weeks ago. (Maybe I am missing something but I am surprised that it does not work now with no code changes?)

Bliss · June 17, 2019, 10:11am

If it is of any help, here is the show_install output:

=== Software === 
python        : 3.7.1
fastai        : 1.0.53.post2
fastprogress  : 0.1.19
torch         : 1.1.0
nvidia driver : 410.72
torch cuda    : 10.0.130 / is available
torch cudnn   : 7501 / is enabled

=== Hardware === 
nvidia gpus   : 1
torch devices : 1
  - gpu0      : 7611MB | Tesla P4

=== Environment === 
platform      : Linux-4.9.0-8-amd64-x86_64-with-debian-9.7
distro        : #1 SMP Debian 4.9.130-2 (2018-10-27)
conda env     : base
python        : /opt/anaconda3/bin/python
sys.path      : /home/jupyter/tutorials/fastai/course-v3/nbs/dl1
/opt/anaconda3/lib/python37.zip
/opt/anaconda3/lib/python3.7
/opt/anaconda3/lib/python3.7/lib-dynload

/opt/anaconda3/lib/python3.7/site-packages
/opt/anaconda3/lib/python3.7/site-packages/IPython/extensions
/home/jupyter/.ipython

msrdinesh · June 17, 2019, 10:13am

If you are doing from first, you need not give the pretrained_fnames parameter, the model directly loads pre-trained wiki-text weights. It is required only if you are using your custom pre-trained weights.

Bliss · June 17, 2019, 10:15am

Maybe some context was missing

These are not “my” pre-trained weights. This is a spanish text generation project so I start with encoder and itos files already done by someone else.

And the problem is that this has stopped working… with the very same files

msrdinesh · June 17, 2019, 10:22am

Oh okay got it, but, the error means that the itos and encoder are of different data. I also faced the same issue, it was resolved when I changed them accordingly. I think they might have been over-written some how.May be take them again from the actual source and try again!

Bliss · June 17, 2019, 10:38am

I just double-checked… but they are still the same original files from December 2018 that have been working until now… so the change must be in the fastai library way of loading them?

msrdinesh · June 18, 2019, 6:21am

Hii is your problem solved? I tried your code, it’s working when I load my custom pre-trained model. Please check loading your own model(may be for english languge) and try to replicate the error. You can also place the encoder and itos in some different path and try loading them.

Bliss · June 18, 2019, 11:52am

It seems to work for me with default english…

The one I am trying to use is the one found here (used by other people and by myself several times)
Would you be so kind to try loading it? Thanks!

msrdinesh · June 18, 2019, 11:54am

Okay I will try!

sgugger · June 18, 2019, 1:11pm

Your issue comes from the breaking change in fastai v1.0.53 making the hidden size a multiple of 8 (1152 instead of 1150) and your pretrained weights have the old size (1150). Just pass along this config to deal with it:

config = awd_lstm_lm_config.copy()
config['n_hid'] = 1150
learn = language_model_learner(data_lm, AWD_LSTM, config=config,
                               pretrained_fnames=[FILE_LM_ENCODER, FILE_ITOS], 
                               drop_mult=0.3)

Bliss · June 18, 2019, 2:08pm

@sgugger ! Thanks A LOT!
You have saved my day… I am in the middle of a project and I was entering panic mode lol.
I was even creating a new GCP instance because I thought I had broken something in my current one…
I cannot thank enough and I can only click once in the heart

BTW, what would be the best way to be up to date of these changes so my heart does not bump next time?

sgugger · June 18, 2019, 2:11pm

Changes are posted here and there. In this case, I forgot to put the way around it for people with their own pretrained weights, sorry about that.

aditya8952 · June 21, 2019, 12:42pm

Hi, @sgugger I tried the same but it did not work. Ther same error as above is thrown when I run my learn.load() function, while the language_model_learner() line ran without issues.
Is there a way I can go back to the previous versions and then run my code?
I am using the Gujarati and Hindi ULM-fit pre-trained language models.

GeorgeMichael · June 22, 2019, 9:20am

Hey man, were you able to resolve this? I am facing the same problem using a pretrained Ulmfit-model for German. Configuring the hidden size to 1150 also did not help.

aditya8952 · June 24, 2019, 9:24am

I imported a previous version of fastai. uninstall fastai and then install the version which was working for you

Mugnaio · June 26, 2019, 2:08pm

That also worked for me on colab, thank you!

lvwerra · July 18, 2019, 7:44am

@aditya8952: using the flag pretrained=False in language_model_learner() solved the issue for me.

akashgshastri · July 28, 2019, 1:29pm

Hey i had the same problem but it worked for me
make sure to mention the parameter config = config in the learn definition, maybe u missed that.