Load_enc size mismatch error


#1

Hey folks,

When running the classification part of the ULMfit training I’m hitting an error when I load my encoder into the classification learner:

RuntimeError Traceback (most recent call last)
in
----> 1 learn.load_encoder(‘ft_enc’)

~/fastai/fastai/text/learner.py in load_encoder(self, name)
61 def load_encoder(self, name:str):
62 “Load the encoder name from the model directory.”
—> 63 self.model[0].load_state_dict(torch.load(self.path/self.model_dir/f’{name}.pth’))
64 self.freeze()
65

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
767 if len(error_msgs) > 0:
768 raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
–> 769 self.class.name, “\n\t”.join(error_msgs)))
770
771 def _named_members(self, get_members_fn, prefix=’’, recurse=True):

RuntimeError: Error(s) in loading state_dict for MultiBatchRNNCore:
size mismatch for encoder.weight: copying a param with shape torch.Size([13860, 400]) from checkpoint, the shape in current model is torch.Size([31482, 400]).
size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([13860, 400]) from checkpoint, the shape in current model is torch.Size([31482, 400]).

I’ve been trying to alter the classifier to do regression, so at first I thought I had stuffed up something with the databunch but I tried the lines exactly from the ‘quick start’ doc and ran into the same problem.

I found a similar error on an older thread, but I’m not quite sure how that fix would work in v1.

Anyone else having this error? I’m going to look into how the text classification learner is created and where that ‘13860’ size is coming from, but thought I’d raise my hand first.

Thanks!


#2

Just found this thread that may show what I’ve done wrong– somehow I’ve my vocabulary mixed up, maybe.


#3

So I’ve figured out the mistake I made, just in case anyone else has this same problem, maybe they made this mistake as well:

I had been playing around with smaller datasets to experiment with (30K examples), and the encoder I was trying to load was trained with a larger dataset (80K examples).

Once I tried training a new lm encoder with the 80K dataset, the classifier learning loaded it just fine.


(chandan) #4

i am having similar issue despite training both encoder and classifier with same dataset . i do not understand what is missing.My input csv has about 12k rows

trace:

RuntimeError Traceback (most recent call last)
in ()
1 learn = text_classifier_learner(data_clas, drop_mult=0.5)
----> 2 learn.load_encoder(‘fine_tuned_enc’)
3 learn.freeze()
4 learn.fit_one_cycle(1, slice(5e-3/2., 5e-3))

/usr/local/lib/python3.6/dist-packages/fastai/text/learner.py in load_encoder(self, name)
61 def load_encoder(self, name:str):
62 “Load the encoder name from the model directory.”
—> 63 get_model(self.model)[0].load_state_dict(torch.load(self.path/self.model_dir/f’{name}.pth’))
64 self.freeze()
65

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
767 if len(error_msgs) > 0:
768 raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
–> 769 self.class.name, “\n\t”.join(error_msgs)))
770
771 def _named_members(self, get_members_fn, prefix=’’, recurse=True):

RuntimeError: Error(s) in loading state_dict for MultiBatchRNNCore:
size mismatch for encoder.weight: copying a param with shape torch.Size([7122, 400]) from checkpoint, the shape in current model is torch.Size([7140, 400]).
size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([7122, 400]) from checkpoint, the shape in current model is torch.Size([7140, 400]).

Reading data for classifier

data_clas = (TextList.from_df(file, ‘’, cols=‘original_doc’)
#Where are the inputs? Column ‘text’ of this csv
.random_split_by_pct()
#How to split it? Randomly with the default 20%
.label_from_df(cols=‘groundtruth’)
#Label it for a language model
.databunch())

reading data for LM

data_lm = (TextList.from_df(file, '', cols='original_doc')
           #Where are the inputs? Column 'text' of this csv
                   .random_split_by_pct()
           #How to split it? Randomly with the default 20%
                   .label_for_lm(cols='groundtruth')
           #Label it for a language model
                   .databunch())

#5

I’ve run into the same problem– your vocab size looks slightly different (7122 vs 7140) which is what happens to me as well. As long as you loaded in LM databunch’s vocab when you initialized your data_clas databunch, I don’t know why your vocab sizes would be different.

It’s not a real solution, but I have managed to get around this problem by running the whole process in one session, rather than saving the fine-tuned LM and then reloading everything in a later session to do classification.

I wonder if there’s something in the load process that can alter how the vocab is loading?

Sorry that I am not more help!


(chandan) #6

Thanks for replying its good to have someone at least hear your problem,
I am doing everything in one session of google colab. Even my guess is that language model databunch is missing few rows which are present in classification databunch and this is ultimately resulting in vocab size mismatch.

Is there a way to visualize the databunch apart from show batch after it has done all the hidden steps of tokenization and replacing and all?

only diffrence in my data_lm and data_clas is that i use label_for_lm in LM and label_from_df in data_clas does it make any diffrence?

i tried with another dataset and still ended up with
size mismatch for encoder.weight: copying a param with shape torch.Size([23824, 400]) from checkpoint, the shape in current model is torch.Size([24010, 400]).


#7

Sorry I am just getting back to this thread! Maybe you have already found the solution, in which case great.

Looking at the problem you’re describing, I am not sure what could be going wrong. I created my data_lm and data_clas objects using the factory methods like in the tutorial:

TextLMDataBunch.from_csv(path, 'file.csv')
and
TextClasDataBunch.from_csv(path, 'file.csv', vocab=data_lm.train_ds.vocab, bs=32).

I’m not quite sure what label_for_lm and label_from_df do in terms of the databunch. Might be worth checking the source code for those.

Then only other help I can think of is to simply show you my code as is stands now, and maybe there’s something in there that may help you. I’m sorry it’s so messy, but perhaps something in there will help. I should clarify that I’m working on tweaking the classifier to become a regression optimizer, so not exactly the original use case. But as far as getting data_lm and data_clas sorted, that is all the same as far as I understand.