Saved model doesn't have c (num classes) stored?

rmkn85 · October 16, 2019, 9:51pm

Typical usecase: train a model with a DL with augmentation and labelled train/val set, and later load model and just run inference on another DL without augmentation and unlabelled test set.

So in one notebook I trained the model using a learner, and then saved it (using learner.save()).
Running learner.load() in the same notebook works.

However, if I run learner.load() in another notebook, I get an error:

RuntimeError: Error(s) in loading state_dict for Sequential:
size mismatch for 1.8.weight: copying a param with shape torch.Size([2931, 512]) from checkpoint, the shape in current model is torch.Size([0, 512]).
size mismatch for 1.8.bias: copying a param with shape torch.Size([2931]) from checkpoint, the shape in current model is torch.Size([0]).

While writing this post, I realized that 2931 was train_db.c (number of output classes).
So setting test_db.c = 2931 and then loading the model seems to resolve it.

It couldn’t be deduced from the test data, since the test doesn’t have labels. But also I didn’t get any error on trying to create a label without specifying c.

Can you please add tests for save/load and these cases in learner notebook? Thanks!

Actually another related issue: transforms of categorical labels have Categorize which learns a vocab to translate between the labels to integer encoding.
That information is also critical to enabling using the model in inference.
Should it be saved with the model?

sgugger · October 16, 2019, 11:20pm

Yes, this is not a bug. You are mixing learn.save/learn.load with the export functionality (which has not been developed in v2 yet). learn.save saves the model and the model only. learn.load loads it, but if in a new notebook, you create a learner from a different DataBunch, it won’t match.

This is the same behavior as in v1, so it shouldn’t come as a surprise.

rmkn85 · October 16, 2019, 11:25pm

I think that’s counter-intuitive, can we at least get a defensive programming warning about any such mismatch?

So there is no way now to load a model for inference?

sgugger · October 16, 2019, 11:29pm

I’m not sure of what is is counter-intuitive. The documentation is pretty clear that it saves the model state (+ optimizer state with the right option) only and this matches the behavior of v1.

No, there is nothing for inference ready in fastai v2. Remember that v2 is barely in alpha stage and under development.

rmkn85 · October 16, 2019, 11:43pm

By “model state”, I assume it means the weights and hyperparameters.
The DataBunch is used to feed the training stage of the model, but it shouldn’t be needed for recreating the model. It’s also not mentioned in the documentation.
In fact, if I understand correctly, the only thing from the DataBunch that’s used here is the c parameter, which is used to dynamically build the last FC layer of the model architecture.

What’s counter-intuitive for me in this case, is that trying to load without a DataBunch, that parameter is neither loaded from the file, nor is there a warning that there was no c provided.

jeremy · October 17, 2019, 1:05am

Please don’t state opinions as fact in this way on these forums. It does not make for pleasant interactions, in my experience.

rmkn85 · October 17, 2019, 6:15am

Sorry, I changed to make it clear that’s my opinion. I explained my intuition (about not needing same DB on model load), can you please tell me what you disagree with?

Also, shouldn’t code verify it’s inputs and assumptions everywhere and clearly fail when they are not met? (even if it makes it not fit into a screen anymore)

jeremy · October 18, 2019, 5:45am

No I don’t agree with you. export is for exporting, and you should use that when it’s ready.

kdorichev · October 19, 2019, 3:35pm

Hey, @rmkn85 Romi, thanks for your question. I have learned the difference betwen save and export.
Here is what I found in the documentation for v1:

Once everything is ready for inference, we just have to call learn.export to save all the information of our Learner object for inference: the stuff we need in the DataBunch (transforms, classes, normalization…), the model with its weights and all the callbacks our Learner was using. Everything will be in a file named export.pkl in the folder learn.path . If you deploy your model on a different machine, this is the file you’ll need to copy.

Source.