Issue with loading model

Hi,

I dont arrive to understand why the save/load procedure do not work.

In the following test code, I simply create the model, save the weigths, then create another one similar, and load the previously saved weights:

data = ImageDataBunch.from_folder(path_img, train='train', valid_pct = 0.2, size = 224, ds_tfms=get_transforms(flip_vert=True))
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.save('prova')

data = ImageDataBunch.from_folder(path_img, train='train', valid_pct = 0.2, size = 224, ds_tfms=get_transforms(flip_vert=True))
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.load('prova')

As output, I get the error:

RuntimeError: Error(s) in loading state_dict for Sequential:
size mismatch for 1.8.weight: copying a param with shape torch.Size([618, 512]) from checkpoint, the shape in current model is torch.Size([616, 512]).
size mismatch for 1.8.bias: copying a param with shape torch.Size([618]) from checkpoint, the shape in current model is torch.Size([616]).

What am I missing here?

Thank you

Since you are using a random validation set, your vocabulary changed in the two data objects (itā€™s computed on the training set only). Thatā€™s why you get that mismatch. You should save your data object or the vocabulary youā€™re using.

I am having the exact same problem as @VLavorini.

I have spent a long time training a model and saved using learn.save(ā€˜model_nameā€™). Since I am using a random validation set and have not saved the data object, is there no option but to start training again from scratch?

Many thanks.

1 Like

how did you solve it?
I have a similar issue

can you please explain how to save the data object or the vocabulary and how to load it into the new leaner?

Oh please answer thisā€¦ Iā€™ve spent days hereā€¦ what exactly does ā€œsave your datasetā€ mean? I have my dataset in foldersā€¦ There is nothing in the docs or in the forumā€¦ how are we the few who struggle with this issue, should be pretty basicā€¦

edit: okay I may have gotten somewhere with this in this topic. I donā€™t actually have a clue whatā€™s going on though.

1 Like

Hi, I have encountered the same issue. I trained a large model in Google Colab and trying to continue where I left off but I am not able to load the model. Any solution?

I found this solution from another page of Fasi AI forum. It seems the issue can be solced adding, ā€œremove_module=Trueā€. It seems there is a mismatch between the Tensor size of the new model I create from resnet18 and the one that I saved. The following code might help:

learn.load('YOUR_SAVED_MODEL',strict=False,remove_module=True)
4 Likes

In my case it ended up being wrong data loading, I was loading unlabeled test data, and the loader assumed one class.
When I initialized the model it accidentally thinks there is one class in the data, and hence initialize the last layer with wrong dimension

Faced similar issue and can confirm removing the module and marking as not strict resolve for me as indicated by @AIML2

For othersā€™ reference:

I was facing a similar size mismatch problem when loading weights (from the same model, learning & dataset specs from the same notebook just trained on a different machine), but remove_module is longer a recognized kwarg, and just adding strict=False had no effect.

I could go into details of how I traced the error but TL/DR: What fixed it in this case was just restarting the kernel on Colab and trying again. :man_shrugging:

1 Like