Unexpected key(s) in state_dict: "model", "opt"

My first post and first error while I was trying to deploy a model to heroku using python bundle

Error log

    2019-03-03T10:52:03.895369+00:00 app[web.1]: 	Unexpected key(s) in state_dict: "model", "opt". 
2019-03-03T10:52:03.896990+00:00 app[web.1]: [2019-03-03 10:52:03 +0000] [35] [INFO] Worker exiting (pid: 35)
2019-03-03T10:52:04.988168+00:00 app[web.1]: [2019-03-03 10:52:04 +0000] [34] [ERROR] Exception in worker process
2019-03-03T10:52:04.988176+00:00 app[web.1]: Traceback (most recent call last):
2019-03-03T10:52:04.988179+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
2019-03-03T10:52:04.988181+00:00 app[web.1]:     worker.init_process()
2019-03-03T10:52:04.988182+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/workers/base.py", line 129, in init_process
2019-03-03T10:52:04.988184+00:00 app[web.1]:     self.load_wsgi()
2019-03-03T10:52:04.988186+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
2019-03-03T10:52:04.988208+00:00 app[web.1]:     self.wsgi = self.app.wsgi()
2019-03-03T10:52:04.988211+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
2019-03-03T10:52:04.988213+00:00 app[web.1]:     self.callable = self.load()
2019-03-03T10:52:04.988215+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
2019-03-03T10:52:04.988216+00:00 app[web.1]:     return self.load_wsgiapp()
2019-03-03T10:52:04.988218+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
2019-03-03T10:52:04.988220+00:00 app[web.1]:     return util.import_app(self.app_uri)
2019-03-03T10:52:04.988222+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/util.py", line 350, in import_app
2019-03-03T10:52:04.988223+00:00 app[web.1]:     __import__(module)
2019-03-03T10:52:04.988225+00:00 app[web.1]:   File "/app/app.py", line 31, in <module>
2019-03-03T10:52:04.988226+00:00 app[web.1]:     learn.load('stage-34resnet-1_0.063')
2019-03-03T10:52:04.988228+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/fastai/basic_train.py", line 204, in load
2019-03-03T10:52:04.988230+00:00 app[web.1]:     self.model.load_state_dict(torch.load(self.path/self.model_dir/f'{name}.pth', map_location=device))
2019-03-03T10:52:04.988232+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
2019-03-03T10:52:04.988234+00:00 app[web.1]:     self.__class__.__name__, "\n\t".join(error_msgs)))
2019-03-03T10:52:04.988235+00:00 app[web.1]: RuntimeError: Error(s) in loading state_dict for Sequential:
2019-03-03T10:52:04.988264+00:00 app[web.1]: 	Missing key(s) in state_dict: "0.0.weight", "0.1.weight", "0.1.bias", "0.1.running_mean", "0.1.running_var", "0.4.0.conv1.weight", "0.4.0.bn1.weight", "0.4.0.bn1.bias", "0.4.0.bn1.running_mean", "0.4.0.bn1.running_var", "0.4.0.conv2.weight", "0.4.0.bn2.weight", "0.4.0.bn2.bias", "0.4.0.bn2.running_mean", "0.4.0.bn2.running_var", "0.4.1.conv1.weight", "0.4.1.bn1.weight", "0.4.1.bn1.bias", "0.4.1.bn1.running_mean", "0.4.1.bn1.running_var", "0.4.1.conv2.weight", "0.4.1.bn2.weight", "0.4.1.bn2.bias", "0.4.1.bn2.running_mean", "0.4.1.bn2.running_var", "0.4.2.conv1.weight", "0.4.2.bn1.weight", "0.4.2.bn1.bias", "0.4.2.bn1.running_mean", "0.4.2.bn1.running_var", "0.4.2.conv2.weight", "0.4.2.bn2.weight", "0.4.2.bn2.bias", "0.4.2.bn2.running_mean", "0.4.2.bn2.running_var", "0.5.0.conv1.weight", "0.5.0.bn1.weight", "0.5.0.bn1.bias", "0.5.0.bn1.running_mean", "0.5.0.bn1.running_var", "0.5.0.conv2.weight", "0.5.0.bn2.weight", "0.5.0.bn2.bias", "0.5.0.bn2.running_mean", "0.5.0.bn2.running_var", "0.5.0.downsample.0.weight", "0.5.0.downsample.1.weight", "0.5.0.downsample.1.bias", "0.5.0.downsample.1.running_mean", "0.5.0.downsample.1.running_var", "0.5.1.conv1.weight", "0.5.1.bn1.weight", "0.5.1.bn1.bias", "0.5.1.bn1.running_mean", "0.5.1.bn1.running_var", "0.5.1.conv2.weight", "0.5.1.bn2.weight", "0.5.1.bn2.bias", "0.5.1.bn2.running_mean", "0.5.1.bn2.running_var", "0.5.2.conv1.weight", "0.5.2.bn1.weight", "0.5.2.bn1.bias", "0.5.2.bn1.running_mean", "0.5.2.bn1.running_var", "0.5.2.conv2.weight", "0.5.2.bn2.weight", "0.5.2.bn2.bias", "0.5.2.bn2.running_mean", "0.5.2.bn2.running_var", "0.5.3.conv1.weight", "0.5.3.bn1.weight", "0.5.3.bn1.bias", "0.5.3.bn1.running_mean", "0.5.3.bn1.running_var", "0.5.3.conv2.weight", "0.5.3.bn2.weight", "0.5.3.bn2.bias", "0.5.3.bn2.running_mean", "0.5.3.bn2.running_var", "0.6.0.conv1.weight", "0.6.0.bn1.weight", "0.6.0.bn1.bias", "0.6.0.bn1.running_mean", "0.6.0.bn1.running_var", "0.6.0.conv2.weight", "0.6.0.bn2.weight", "0.6.0.bn2.bias", "0.6.0.bn2.running_mean", "0.6.0.bn2.running_var", "0.6.0.downsample.0.weight", "0.6.0.downsample.1.weight", "0.6.0.downsample.1.bias", "0.6.0.downsample.1.running_mean", "0.6.0.downsample.1.running_var", "0.6.1.conv1.weight", "0.6.1.bn1.weight", "0.6.1.bn1.bias", "0.6.1.bn1.running_mean", "0.6.1.bn1.running_var", "0.6.1.conv2.weight", "0.6.1.bn2.weight", "0.6.1.bn2.bias", "0.6.1.bn2.running_mean", "0.6.1.bn2.running_var", "0.6.2.conv1.weight", "0.6.2.bn1.weight", "0.6.2.bn1.bias", "0.6.2.bn1.running_mean", "0.6.2.bn1.running_var", "0.6.2.conv2.weight", "0.6.2.bn2.weight", "0.6.2.bn2.bias", "0.6.2.bn2.running_mean", "0.6.2.bn2.running_var", "0.6.3.conv1.weight", "0.6.3.bn1.weight", "0.6.3.bn1.bias", "0.6.3.bn1.running_mean", "0.6.3.bn1.running_var", "0.6.3.conv2.weight", "0.6.3.bn2.weight", "0.6.3.bn2.bias", "0.6.3.bn2.running_mean", "0.6.3.bn2.running_var", "0.6.4.conv1.weight", "0.6.4.bn1.weight", "0.6.4.bn1.bias", "0.6.4.bn1.running_mean", "0.6.4.bn1.running_var", "0.6.4.conv2.weight", "0.6.4.bn2.weight", "0.6.4.bn2.bias", "0.6.4.bn2.running_mean", "0.6.4.bn2.running_var", "0.6.5.conv1.weight", "0.6.5.bn1.weight", "0.6.5.bn1.bias", "0.6.5.bn1.running_mean", "0.6.5.bn1.running_var", "0.6.5.conv2.weight", "0.6.5.bn2.weight", "0.6.5.bn2.bias", "0.6.5.bn2.running_mean", "0.6.5.bn2.running_var", "0.7.0.conv1.weight", "0.7.0.bn1.weight", "0.7.0.bn1.bias", "0.7.0.bn1.running_mean", "0.7.0.bn1.running_var", "0.7.0.conv2.weight", "0.7.0.bn2.weight", "0.7.0.bn2.bias", "0.7.0.bn2.running_mean", "0.7.0.bn2.running_var", "0.7.0.downsample.0.weight", "0.7.0.downsample.1.weight", "0.7.0.downsample.1.bias", "0.7.0.downsample.1.running_mean", "0.7.0.downsample.1.running_var", "0.7.1.conv1.weight", "0.7.1.bn1.weight", "0.7.1.bn1.bias", "0.7.1.bn1.running_mean", "0.7.1.bn1.running_var", "0.7.1.conv2.weight", "0.7.1.bn2.weight", "0.7.1.bn2.bias", "0.7.1.bn2.running_mean", "0.7.1.bn2.running_var", "0.7.2.conv1.weight", "0.7.2.bn1.weight", "0.7.2.bn1.bias", "0.7.2.bn1.running_mean", "0.7.2.bn1.running_var", "0.7.2.conv2.weight", "0.7.2.bn2.weight", "0.7.2.bn2.bias", "0.7.2.bn2.running_mean", "0.7.2.bn2.running_var", "1.2.weight", "1.2.bias", "1.2.running_mean", "1.2.running_var", "1.4.weight", "1.4.bias", "1.6.weight", "1.6.bias", "1.6.running_mean", "1.6.running_var", "1.8.weight", "1.8.bias". 
2019-03-03T10:52:04.988398+00:00 app[web.1]: 	Unexpected key(s) in state_dict: "model", "opt". 
2019-03-03T10:52:04.989812+00:00 app[web.1]: [2019-03-03 10:52:04 +0000] [34] [INFO] Worker exiting (pid: 34)
2019-03-03T10:52:07.571918+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=GET path="/" host=waterclassify.herokuapp.com request_id=aaa6a40d-4494-4ef2-b239-f72cd4741789 fwd="106.180.12.228" dyno=web.1 connect=0ms service=24043ms status=503 bytes=0 protocol=https
2019-03-03T10:52:07.574313+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=GET path="/" host=waterclassify.herokuapp.com request_id=691300cb-0c93-45f8-9790-088617cdef3a fwd="106.180.12.228" dyno=web.1 connect=0ms service=16257ms status=503 bytes=0 protocol=https
2019-03-03T10:52:07.695314+00:00 app[web.1]: Traceback (most recent call last):
2019-03-03T10:52:07.695609+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 210, in run
2019-03-03T10:52:07.737363+00:00 app[web.1]:     self.sleep()
2019-03-03T10:52:07.737455+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 360, in sleep
2019-03-03T10:52:07.739091+00:00 app[web.1]:     ready = select.select([self.PIPE[0]], [], [], 1.0)
2019-03-03T10:52:07.739239+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 245, in handle_chld
2019-03-03T10:52:07.740067+00:00 app[web.1]:     self.reap_workers()
2019-03-03T10:52:07.740232+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 525, in reap_workers
2019-03-03T10:52:07.741596+00:00 app[web.1]:     raise HaltServer(reason, self.WORKER_BOOT_ERROR)
2019-03-03T10:52:07.742674+00:00 app[web.1]: gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
2019-03-03T10:52:07.742710+00:00 app[web.1]: 
2019-03-03T10:52:07.742712+00:00 app[web.1]: During handling of the above exception, another exception occurred:
2019-03-03T10:52:07.742714+00:00 app[web.1]: 
2019-03-03T10:52:07.742797+00:00 app[web.1]: Traceback (most recent call last):
2019-03-03T10:52:07.743149+00:00 app[web.1]:   File "/app/.heroku/python/bin/gunicorn", line 11, in <module>
2019-03-03T10:52:07.743872+00:00 app[web.1]:     sys.exit(run())
2019-03-03T10:52:07.744173+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
2019-03-03T10:52:07.744729+00:00 app[web.1]:     WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
2019-03-03T10:52:07.744773+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/base.py", line 223, in run
2019-03-03T10:52:07.745653+00:00 app[web.1]:     super(Application, self).run()
2019-03-03T10:52:07.745704+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/base.py", line 72, in run
2019-03-03T10:52:07.746127+00:00 app[web.1]:     Arbiter(self).run()
2019-03-03T10:52:07.746182+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 232, in run
2019-03-03T10:52:07.746904+00:00 app[web.1]:     self.halt(reason=inst.reason, exit_status=inst.exit_status)
2019-03-03T10:52:07.746942+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 345, in halt
2019-03-03T10:52:07.755148+00:00 app[web.1]:     self.stop()
2019-03-03T10:52:07.755196+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 393, in stop
2019-03-03T10:52:07.755659+00:00 app[web.1]:     time.sleep(0.1)
2019-03-03T10:52:07.755781+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 245, in handle_chld
2019-03-03T10:52:07.756078+00:00 app[web.1]:     self.reap_workers()
2019-03-03T10:52:07.756115+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 525, in reap_workers
2019-03-03T10:52:07.756851+00:00 app[web.1]:     raise HaltServer(reason, self.WORKER_BOOT_ERROR)
2019-03-03T10:52:07.756947+00:00 app[web.1]: gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
2019-03-03T10:52:11.044653+00:00 heroku[web.1]: State changed from up to crashed
2019-03-03T10:52:11.021671+00:00 heroku[web.1]: Process exited with status 1
2019-03-03T10:52:12.510419+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/favicon.ico" host=waterclassify.herokuapp.com request_id=f0a501a0-b15e-44ab-85f8-fef3b31630f4 fwd="106.180.12.228" dyno=web.1 connect=1ms service= status=503 bytes= protocol=https

My learner code

path_anno = path/'annotations'
path_img = path/'images'
fnames = get_image_files(path_img)
fnames[:5]
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=64 ).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(4)
learn.save(Path(base_dir + 'data/lesson1/stage-34resnet-1'))

Ignore the saved model name as I can see it was getting picked up correctly in run model code

Run model code

path = Path("path")
bs = 64
classes = ['Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair', 'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue', 'Siamese', 'Sphynx', 'american_bulldog', 'american_pit_bull_terrier', 'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel', 'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese', 'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland', 'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu', 'staffordshire_bull_terrier', 'wheaten_terrier', 'yorkshire_terrier']
data2 = ImageDataBunch.single_from_classes(path, classes, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)
learn = create_cnn(data2, models.resnet34)
learn.load('stage-34resnet-1_0.063')

I used Colab for training and my deployed app has the following requirements.txt

Flask
gunicorn
numpy
fastai
https://download.pytorch.org/whl/nightly/cpu/torch_nightly-1.0.0.dev20190129-cp36-cp36m-linux_x86_64.whl

Any idea how to solve the above issue

PS - Please put this to correct category, if I have put this post in wrong category, as this is my first issue in fastai and first time posting :slight_smile:

Using fastai v1 for both saving and loading, as far as I know

1 Like

Hi , have you resolved this issue?

I’m getting this issue too.

I have trained my image classifier model in google Colab (using lesson 2), export the model
and tried to do the following:

learner.model.load_state_dict(
torch.load(‘stage-2.pth’, map_location=“cpu”)
)

where I encountered the following error:

No not yet, did not try after posting this here
I’m assuming it must be either of the following

  • I made some mistake with selecting fastai/torch version while training on Colab
  • I made some mistake with selecting fastai/torch version while running inference on Heroku
  • I am probably wrongly using the inference method on Heroku

I’ll post here if I solve it!

This issue somehow reminds me of the time when Tensorflow v1 was out and people were facing issues with their models

Anyway if anyone else has solved this problem in past, then please do help!

Hi @shrex , I have more updates to this.
Refer to my SO post. https://stackoverflow.com/questions/55047065/unexpected-keys-in-state-dict-model-opt/55047370#55047370

Most likely this is due to different fastai versions which results in different models generated.

Question now would be how to check what version of fast.ai is Colab running on?

1 Like

I have solved the issue.
It’s due to different fast.ai version running between my docker and colab.

Follow the following code snippet before running ur colab notebook solves my issue.

2 Likes

Hi, I’m experiencing the same issue.

I checked using the python -m fastai.utils.show_install command and I have the same fastai version both in Google Colab and my local environment. This is the error I’m getting:

Unexpected key(s) in state_dict: "opt_func", "loss_func", "metrics", "true_wd", "bn_wd", "wd", "train_bn", "model_dir", "callback_fns", "cb_state", "model", "data", "cls". 

Any idea why this may be happening?

Thanks.

Can you share a little more detail on what you’re doing when you get this error? What fast.ai version are you running?