Use pretrained model with less classes to start training a model with more classes

Hi,

I’m currently trying to classify cracks. I used Training sets available online (e.g.https://digitalcommons.usu.edu/all_datasets/48/) which used two classes either a crack or no crack. This worked fine even with transfer-learning from one set to the other one. I created my own data set with either a crack, two cracks or three cracks, to be able to kind of count the amount of cracks. I trained a model with this as well and it worked fine.

Now I wanted to improve my “Counting” Model by using the pretrained “simple” crack model as start and I got this Error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-8a8b0cc72704> in <module>
----> 1 learn.load('transfer-phase2_resnet34_gr64')

C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py in load(self, name, device, strict, with_opt, purge, remove_module)
266             model_state = state['model']
267             if remove_module: model_state = remove_module_load(model_state)
--> 268             get_model(self.model).load_state_dict(model_state, strict=strict)
269             if ifnone(with_opt,True):
270                 if not hasattr(self, 'opt'): self.create_opt(defaults.lr, self.wd)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in load_state_dict(self, state_dict, strict)
767         if len(error_msgs) > 0:
768             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 769                                self.__class__.__name__, "\n\t".join(error_msgs)))
770 
771     def _named_members(self, get_members_fn, prefix='', recurse=True):

RuntimeError: Error(s) in loading state_dict for Sequential:
	size mismatch for 1.8.weight: copying a param with shape torch.Size([2, 512]) from checkpoint, the shape in current model is torch.Size([3, 512]).
	size mismatch for 1.8.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([3]).

As far as I understand the problem is that there is a different amout of classes in both models. Is there a possibility to use the advantage of the pretrained model altough number of classes doesn’t match? Or would it even be an disadvantage to use it?

thanks for your time!

BR
Claus

Pretraining should always be advantageous. You should load the weights, then discard the last layer, then add a new final layer that predicts on the greater number of categories.

Then, train just the last layer (freeze the rest of the model). The reason for this is that the features in the earlier layers are already finely tuned… so you need to train just the last layer to be able to effectively interpret the output of the earlier layers. Once you’ve trained the last layer, you can set the entire model to trainable (with using discriminative learning rates, so that the earlier layers have a smaller LR than the later layers).

Hi could you please point me to a working example of this approach? I am not sure how to load the saved weights from previous model to the new learner.

Hi,
in general loading and saving of a model can be done by learn = learn.load("trained_model").

So the workflow to load a trained learner would e.g. be:

learn = cnn_learner(data = NewData,
                   base_arch = usedArchitecture,
                   metrics = usedMetrics)
learn.load('trained_model')
learn.freeze()
learn.fit_one_cycle()

Unfortunately I haven’t figured out how to discard the last layer. This would mean that you could have more ore less outputs than you had before. I completly understand the concept but I don’t know how to propperly put it into code.

1 Like

Hi,

I had a look how to do this:
Check out this thread:


This should problably help you.
I think the code of post in particular should be able to do it, and it even has a working example:

Please tell us if it worked out for you!
BR
Claus

1 Like

Thanks for the reply Claus!
Here is what I have tried in fastai version 1.0.57

=== Software === 
python        : 3.6.7
fastai        : 1.0.57
fastprogress  : 0.1.21
torch         : 1.0.0
nvidia driver : 410.104
torch cuda    : 9.0.176 / is available
torch cudnn   : 7401 / is enabled

3 Classes data set and training
#Load data with 3 classes
data3 = ImageDataBunch.from_name_re(path_3, fnames_3, pat, ds_tfms=get_transforms(), size=224, bs=64).normalize(imagenet_stats)
#create a 3 class learner with pre-trained weights from models.resnet34
learn3 = cnn_learner(data3, models.resnet34, metrics=error_rate)
#fit the model
learn3.fit_one_cycle(4)
#change the last layer to output 5 features
learn3.model[-1][-1] = nn.Linear(in_features= 512, out_features=5, bias = True)
#save the model
learn3.save(‘stage_1_class3adjto5’)

5 Classes data set and training
#Load data with 5 classes
data5 = ImageDataBunch.from_name_re(path_5, fnames_5, pat, ds_tfms=get_transforms(), size=224, bs=64).normalize(imagenet_stats)
#create a 5 class learner with pre-trained weights from models.resnet34
learn5 = cnn_learner(data5, models.resnet34, metrics=error_rate)
#load the model saved earlier with our put features adjusted to 5
learn5.load(’/home/avatar/.fastai/data/oxford_3/models/stage_1_class3adjto5’)
learn5.fit_one_cycle(4)

Got the following error
RuntimeError: The size of tensor a (3) must match the size of tensor b (5) at non-singleton dimension 0

Thanks!

Hi,

looks good so far to me.
The only thing that I would change is, that I would try to change the second last and not the last layer since the last layers of the resnet should look something like:
(16): Linear(in_features=512, out_features=2, bias=True)
(17): LogSoftmax()
if directly written in pytorch. Correct me if I’m wrong but from my point of view you are just changing the LogSoftmax() to nn.Linear(in_features= 512, out_features=5, bias = True) and not the desired layer. The thing is you won’t get the same Error as I did, since you have the the desired number of classes as output.
Please tell us if it worked if you try it out, unfortunatelly I can’t check it right now.

BR
Claus

Hi,
I did some research:
Try implementing this piece of code:
model = resnet34(pretrained=True) (Your trained model here)

custom_head1 = nn.Sequential( nn.Linear(in_features=512, out_features=5, bias=True), nn.LogSoftmax())

model_ch1 = nn.Sequential(*list(children(model))[:-2], custom_head1)

I think this should do the trick. At least it worked similar in:

BR
Claus

1 Like