Can't load checkpoint saved by SaveModelCallback

virilo · April 11, 2019, 6:18am

Hi,

I’m trying to load with torch.load a model .pth saved via SaveModelCallback.

And I’m getting the next error:RuntimeError:

Error(s) in loading state_dict for ResNet:

Missing key(s) in state_dict: "conv1.weight", ...

…Unexpected key(s) in state_dict: “model”, “opt”.

I’m ensuring that the same version of fastai and pytorch is used to save the .pth checkpoint, and to load it later

fastai version: 1.0.51
numpy version: 1.16.2
pandas version: 0.24.2
torch version: 1.0.1.post2
torchvision Version: 0.2.2

I’m using the following code:

def load_url(*args, **kwargs):
    model_dir = Path('models')
    if not (model_dir/MODEL_FILENAME).is_file(): raise FileNotFoundError
    return torch.load(model_dir/MODEL_FILENAME)
model_zoo.load_url = load_url

learn = cnn_learner(data, base_arch=models.resnet152, loss_func=FocalLoss(), metrics=metrics)

I’m using the code above to load both: the initial imagenet pretrained weights, and my checkpoint .pth.

The imagenet pretrained provided by pytorch is working, but my checkpoint is giving this error

What am I doing wrong?

TIA,

Virilo

P.S. I have read in this forum and StackOverflow other people with the same problem. But these cases were related to the fastai version, or to inconsistency in the base_arch parameter

sgugger · April 11, 2019, 12:58pm

fastai usually saves model and optimizer, which is why you have those unexpected keys. Select the model and you should be good:

state = torch.load(bla)
model.load_state_dict(state['model'])

virilo · April 13, 2019, 12:02pm

thanks a lot @sgugger

I tried in this way. Now it reads the ‘model’ inside the state dictionary.

But it seems that layers names have been modified during the saving:

File "imet-fastai-starter.py", line 180, in load_checkpoint
    model.load_state_dict(state['model'])
  File "/opt/anaconda/anaconda3/envs/fp16/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNet:
    Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", ...

...     Unexpected key(s) in state_dict: "0.0.weight", "0.1.weight", "0.1.bias", "0.1.running_mean", "0.1.running_var", "0.1.num_batches_tracked", ...

BTW, I’d like to load also the optimizer data. There must be important weights, like the ones related to momentum, because of I’d like to continue the training

Thanks in advance

sgugger · April 13, 2019, 12:36pm

If the keys are different, it’s not the same model you are trying to load.

virilo · April 13, 2019, 12:55pm

I solved it by loading differently the torch pretrained models, and the fastai saved checkpoints.

For the torch pretrained models, cnn_learner is using something very close to your code: just a constructor + torch.load + model.load_state_dict

For the checkpoints, Learner.load is handling both the ‘opt’ and the ‘model’ in the dictionary to do the load_state_dict

learn = cnn_learner(data, base_arch=base_arch, loss_func=FocalLoss(), metrics=metrics, opt_func=optimizer,
                    pretrained=not MODEL_IS_SAVED_CHECKPOINT)
if MODEL_IS_SAVED_CHECKPOINT:
    learn.load(MODEL_FILENAME.replace('.pth',''))

Perhaps I forgot to give you some details or I should have attached my code.

Thanks a lot for your help @sgugger !

Iron4dam · July 29, 2019, 3:31pm

Hi, did you load a saved fastai learner to a pytorch model in the end? I’m trying to load a saved fastai learner into pytorch but I’m facing the same problem that the keys are different:

learner = cnn_learner(data, models.resnet34, metrics=accuracy, bn_final=True)
learner.fit(...)
learner.save('model_path')

model = models.resnet34()    #  this is a pytorch model object implemented in fastai
state_dict = torch.load('model_path')
model.load_state_dict(state_dict)

virilo · September 17, 2019, 8:37am

@Iron4dam

I didn’t read your post, sorry. Did you solve it?

I think you should:

learner=cnn_learner(…)

and then:

learner.load(model_path)