Load_enc size mismatch error

britton · December 7, 2018, 10:29am

Hey folks,

When running the classification part of the ULMfit training I’m hitting an error when I load my encoder into the classification learner:

RuntimeError Traceback (most recent call last)
in
----> 1 learn.load_encoder(‘ft_enc’)

~/fastai/fastai/text/learner.py in load_encoder(self, name)
61 def load_encoder(self, name:str):
62 “Load the encoder name from the model directory.”
—> 63 self.model[0].load_state_dict(torch.load(self.path/self.model_dir/f’{name}.pth’))
64 self.freeze()
65

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
767 if len(error_msgs) > 0:
768 raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
–> 769 self.class.name, “\n\t”.join(error_msgs)))
770
771 def _named_members(self, get_members_fn, prefix=’’, recurse=True):

RuntimeError: Error(s) in loading state_dict for MultiBatchRNNCore:
size mismatch for encoder.weight: copying a param with shape torch.Size([13860, 400]) from checkpoint, the shape in current model is torch.Size([31482, 400]).
size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([13860, 400]) from checkpoint, the shape in current model is torch.Size([31482, 400]).

I’ve been trying to alter the classifier to do regression, so at first I thought I had stuffed up something with the databunch but I tried the lines exactly from the ‘quick start’ doc and ran into the same problem.

I found a similar error on an older thread, but I’m not quite sure how that fix would work in v1.

Anyone else having this error? I’m going to look into how the text classification learner is created and where that ‘13860’ size is coming from, but thought I’d raise my hand first.

Thanks!

britton · December 7, 2018, 10:36am

Just found this thread that may show what I’ve done wrong– somehow I’ve my vocabulary mixed up, maybe.

britton · December 8, 2018, 11:59am

So I’ve figured out the mistake I made, just in case anyone else has this same problem, maybe they made this mistake as well:

I had been playing around with smaller datasets to experiment with (30K examples), and the encoder I was trying to load was trained with a larger dataset (80K examples).

Once I tried training a new lm encoder with the 80K dataset, the classifier learning loaded it just fine.

chans.best · January 1, 2019, 11:50am

i am having similar issue despite training both encoder and classifier with same dataset . i do not understand what is missing.My input csv has about 12k rows

trace:

RuntimeError Traceback (most recent call last)
in ()
1 learn = text_classifier_learner(data_clas, drop_mult=0.5)
----> 2 learn.load_encoder(‘fine_tuned_enc’)
3 learn.freeze()
4 learn.fit_one_cycle(1, slice(5e-3/2., 5e-3))

/usr/local/lib/python3.6/dist-packages/fastai/text/learner.py in load_encoder(self, name)
61 def load_encoder(self, name:str):
62 “Load the encoder name from the model directory.”
—> 63 get_model(self.model)[0].load_state_dict(torch.load(self.path/self.model_dir/f’{name}.pth’))
64 self.freeze()
65

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
767 if len(error_msgs) > 0:
768 raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
→ 769 self.class.name, “\n\t”.join(error_msgs)))
770
771 def _named_members(self, get_members_fn, prefix=‘’, recurse=True):

RuntimeError: Error(s) in loading state_dict for MultiBatchRNNCore:
size mismatch for encoder.weight: copying a param with shape torch.Size([7122, 400]) from checkpoint, the shape in current model is torch.Size([7140, 400]).
size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([7122, 400]) from checkpoint, the shape in current model is torch.Size([7140, 400]).

Reading data for classifier

data_clas = (TextList.from_df(file, ‘’, cols=‘original_doc’)
#Where are the inputs? Column ‘text’ of this csv
.random_split_by_pct()
#How to split it? Randomly with the default 20%
.label_from_df(cols=‘groundtruth’)
#Label it for a language model
.databunch())

reading data for LM

data_lm = (TextList.from_df(file, '', cols='original_doc')
           #Where are the inputs? Column 'text' of this csv
                   .random_split_by_pct()
           #How to split it? Randomly with the default 20%
                   .label_for_lm(cols='groundtruth')
           #Label it for a language model
                   .databunch())

britton · January 2, 2019, 2:04am

I’ve run into the same problem– your vocab size looks slightly different (7122 vs 7140) which is what happens to me as well. As long as you loaded in LM databunch’s vocab when you initialized your data_clas databunch, I don’t know why your vocab sizes would be different.

It’s not a real solution, but I have managed to get around this problem by running the whole process in one session, rather than saving the fine-tuned LM and then reloading everything in a later session to do classification.

I wonder if there’s something in the load process that can alter how the vocab is loading?

Sorry that I am not more help!

chans.best · January 2, 2019, 5:28am

Thanks for replying its good to have someone at least hear your problem,
I am doing everything in one session of google colab. Even my guess is that language model databunch is missing few rows which are present in classification databunch and this is ultimately resulting in vocab size mismatch.

Is there a way to visualize the databunch apart from show batch after it has done all the hidden steps of tokenization and replacing and all?

only diffrence in my data_lm and data_clas is that i use label_for_lm in LM and label_from_df in data_clas does it make any diffrence?

i tried with another dataset and still ended up with
size mismatch for encoder.weight: copying a param with shape torch.Size([23824, 400]) from checkpoint, the shape in current model is torch.Size([24010, 400]).

britton · January 8, 2019, 10:34am

Sorry I am just getting back to this thread! Maybe you have already found the solution, in which case great.

Looking at the problem you’re describing, I am not sure what could be going wrong. I created my data_lm and data_clas objects using the factory methods like in the tutorial:

TextLMDataBunch.from_csv(path, 'file.csv')
and
TextClasDataBunch.from_csv(path, 'file.csv', vocab=data_lm.train_ds.vocab, bs=32).

I’m not quite sure what label_for_lm and label_from_df do in terms of the databunch. Might be worth checking the source code for those.

Then only other help I can think of is to simply show you my code as is stands now, and maybe there’s something in there that may help you. I’m sorry it’s so messy, but perhaps something in there will help. I should clarify that I’m working on tweaking the classifier to become a regression optimizer, so not exactly the original use case. But as far as getting data_lm and data_clas sorted, that is all the same as far as I understand.

github.com

buttchurch/like-predictor/blob/master/TwitterLMTrain .ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastai import *\n",
    "from fastai.text import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading in data"
   ]
  },
  {

This file has been truncated. show original

britton · January 28, 2019, 7:48am

hey @chans.best, I just had one more thought, which is a stab in the dark. Since there are shuffling and randomizing tasks being done behind the scenes when the data is being preprocessed, perhaps that is why we get slightly altered vocab numbers even though the corpus is exactly the same?

My suggestion is then to set the random seed, and see if it solves the problem of vocab mismatch. I haven’t tried this myself because I’ve moved onto a different challenge for a little while, but just thought I’d throw it out there.

Hope you have had some good luck with your project!

sam95 · May 24, 2019, 3:51pm

I just ran into this problem myself, and explicitly passing the classifier the original vocab list from the language model solved the problem for me.

To recreate, put the data together, specifying your vocab

data_clas = TextClasDataBunch.from_csv(path, 'text.csv',vocab=data_lm.vocab)
data_clas.save('data_clas.pkl')

Check to see if the vocab sizes match up

data_clas = load_data(path, 'data_clas.pkl', bs=bs)
len(data_clas.vocab.itos) == len(data_lm.vocab.itos)

Create the model

learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=1)
learn.load_encoder('fine_tuned_enc')

Doing this avoided that error, and the learner seems to be working as expected.

britton · May 25, 2019, 3:52am

nice problem solving, sam! Thanks for sharing

Hederson · September 6, 2019, 1:41pm

Thank you Sam95. Your tip works for me!.

fortuala · July 19, 2022, 5:58pm

That help me , thanks a lot!