Typical usecase: train a model with a DL with augmentation and labelled train/val set, and later load model and just run inference on another DL without augmentation and unlabelled test set.
So in one notebook I trained the model using a learner, and then saved it (using learner.save()).
Running learner.load()in the same notebook works.
However, if I run learner.load() in another notebook, I get an error:
RuntimeError: Error(s) in loading state_dict for Sequential:
size mismatch for 1.8.weight: copying a param with shape torch.Size([2931, 512]) from checkpoint, the shape in current model is torch.Size([0, 512]).
size mismatch for 1.8.bias: copying a param with shape torch.Size() from checkpoint, the shape in current model is torch.Size().
While writing this post, I realized that 2931 was train_db.c (number of output classes).
So setting test_db.c = 2931 and then loading the model seems to resolve it.
It couldn’t be deduced from the test data, since the test doesn’t have labels. But also I didn’t get any error on trying to create a label without specifying c.
Can you please add tests for save/load and these cases in learner notebook? Thanks!
Actually another related issue: transforms of categorical labels have Categorize which learns a vocab to translate between the labels to integer encoding.
That information is also critical to enabling using the model in inference.
Should it be saved with the model?
Yes, this is not a bug. You are mixing learn.save/learn.load with the export functionality (which has not been developed in v2 yet). learn.save saves the model and the model only. learn.load loads it, but if in a new notebook, you create a learner from a different DataBunch, it won’t match.
This is the same behavior as in v1, so it shouldn’t come as a surprise.
By “model state”, I assume it means the weights and hyperparameters.
The DataBunch is used to feed the training stage of the model, but it shouldn’t be needed for recreating the model. It’s also not mentioned in the documentation.
In fact, if I understand correctly, the only thing from the DataBunch that’s used here is the c parameter, which is used to dynamically build the last FC layer of the model architecture.
What’s counter-intuitive for me in this case, is that trying to load without a DataBunch, that parameter is neither loaded from the file, nor is there a warning that there was no c provided.
Hey, @rmkn85 Romi, thanks for your question. I have learned the difference betwen save and export.
Here is what I found in the documentation for v1:
Once everything is ready for inference, we just have to call learn.export to save all the information of our Learner object for inference: the stuff we need in the DataBunch (transforms, classes, normalization…), the model with its weights and all the callbacks our Learner was using. Everything will be in a file named export.pkl in the folder learn.path . If you deploy your model on a different machine, this is the file you’ll need to copy.