Use pretrained model with less classes to start training a model with more classes



I’m currently trying to classify cracks. I used Training sets available online (e.g. which used two classes either a crack or no crack. This worked fine even with transfer-learning from one set to the other one. I created my own data set with either a crack, two cracks or three cracks, to be able to kind of count the amount of cracks. I trained a model with this as well and it worked fine.

Now I wanted to improve my “Counting” Model by using the pretrained “simple” crack model as start and I got this Error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-8a8b0cc72704> in <module>
----> 1 learn.load('transfer-phase2_resnet34_gr64')

C:\ProgramData\Anaconda3\lib\site-packages\fastai\ in load(self, name, device, strict, with_opt, purge, remove_module)
266             model_state = state['model']
267             if remove_module: model_state = remove_module_load(model_state)
--> 268             get_model(self.model).load_state_dict(model_state, strict=strict)
269             if ifnone(with_opt,True):
270                 if not hasattr(self, 'opt'): self.create_opt(, self.wd)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\ in load_state_dict(self, state_dict, strict)
767         if len(error_msgs) > 0:
768             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 769                                self.__class__.__name__, "\n\t".join(error_msgs)))
771     def _named_members(self, get_members_fn, prefix='', recurse=True):

RuntimeError: Error(s) in loading state_dict for Sequential:
	size mismatch for 1.8.weight: copying a param with shape torch.Size([2, 512]) from checkpoint, the shape in current model is torch.Size([3, 512]).
	size mismatch for 1.8.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([3]).

As far as I understand the problem is that there is a different amout of classes in both models. Is there a possibility to use the advantage of the pretrained model altough number of classes doesn’t match? Or would it even be an disadvantage to use it?

thanks for your time!



(David Bressler) #2

Pretraining should always be advantageous. You should load the weights, then discard the last layer, then add a new final layer that predicts on the greater number of categories.

Then, train just the last layer (freeze the rest of the model). The reason for this is that the features in the earlier layers are already finely tuned… so you need to train just the last layer to be able to effectively interpret the output of the earlier layers. Once you’ve trained the last layer, you can set the entire model to trainable (with using discriminative learning rates, so that the earlier layers have a smaller LR than the later layers).