How to remove the last fully connected layer from a CNN in fastai to stack more than one CNN into a FC-NN?

Hey @davecg That’s a pretty darn good insight!! This is something that I thought of too, but did not know how to do it. Thought about 3D CNN, but that’s not good, since the lateral and the frontal views are not exactly related by depth, although somehow they are related (the up here is like the up there). I thought also if it can be stacked using R: frontal , B: Lateral, G: average of both. But this is not ideal too. Most of imageNet images’ RGB channels are not differ in structure too much like these 2 views. So your idea seems the best to deal with this spatial thingy :slight_smile:

That’s another great idea. Can you share the keras code? One day perhaps, I will port it to fastai/pytorch after learning fastai part2.

@TheShadow29 You are keeping to amaze me with your pytorch skills.

So what will be the difference between your first code when you made one big model combining the two CNN into one FC NN, compared to doing it like in the paper separately?
Why they have done it separately?

Do you agree that if training is possible in one big combined model (of course if GPU memory allows it), then it is better to avoid training separate models? Results will be the same , but the combined model will be more efficient?

@davecg How about this scenario

Let’s say we have a big stadium. We put microphones distributed evenly in the area of the stadium. The spectrogram of each microphone output is an image of sound (frequency x time) . Of course,each microphone is related with all the others in such a way that, large volume of sounds are picked up by roughly all microphones. However small group chit-chats are only detected by nearby microphones.

So the microphone spectrograms for 1 sec will be a square image that is related somehow with the others especially nearby microphones.

So instead of the xray model where we have only to add a convolutional layer with input of (2x7) to capture the spatial information of roughly relation of the two views, here we should add 3d-CNN to capture the 2D spatial information of the microphone distribution on the stadium floor? If we have 6 x 8 microphones, the 3d-CNN input should be like 6 x 8 x 7, right? What do you think?

This is an interesting project that I remembered using micrphones in a forest:
The fight against illegal deforestation with TensorFlow

I think Jeremy mentioned that Sara Hooker was involved in this project which is one of the best fastai students. Here is her talk about the project

Thank you guys for the amazing insight and discussions!

Bigger models are often difficult to train without much gain in performance. Also they had separate datasets so makes sense to train them separately

1 Like

Will this solution copy existing weights to the model with the new custom head? I suspect not, unfortunately. Is there any way to do that?

I think it would.

You’re right - I was able to print them. But I have another problem that you might be able to answer. I:

  1. train a conversion model with several classes in its dataset labels.
  2. change its model head to two output classes (now inconsistent with its dataset).
  3. save this new model with
  4. create a new learner with only two classes in its dataset.
  5. read the model created in 3 into this new learner (now consistent with its dataset).

I get the error messages when reading into the new learner:
Missing key(s) in state_dict: “8.2.weight”, “8.2.bias”, “8.2.running_mean”, “8.2.running_var”, “8.4.weight”, “8.4.bias”, “8.6.weight”, “8.6.bias”, “8.6.running_mean”, “8.6.running_var”, “8.8.weight”, “8.8.bias”.
Unexpected key(s) in state_dict: “10.weight”, “10.bias”, “10.running_mean”, “10.running_var”, “10.num_batches_tracked”, “12.weight”, “12.bias”, “14.weight”, “14.bias”, “14.running_mean”, “14.running_var”, “14.num_batches_tracked”, “18.weight”, “18.bias”, “18.running_mean”, “18.running_var”, “18.num_batches_tracked”, “20.weight”, “20.bias”, “22.weight”, “22.bias”, “22.running_mean”, “22.running_var”, “22.num_batches_tracked”, “24.weight”, “24.bias”.

Do you know what these errors mean?

Its basically saying that the model you loaded and the new model don’t share the same parameters, so can’t be directly loaded. when you do learn.load, internally it loads a state dictionary. The “num.bias” that you are seeing is the key and the value is the parameter value. You could do learn.model.load_state_dict(torch.load(path_to_file), strict=False). If that doesn’t work, you will have to create a new dictionary, and remove the keys that are different, and load only the subset of keys which are same.

@TheShadow29 I kept your notebook in my browser tab, to reproduce it with fastai v1 after finishing the course, but seems a lot of changes happened to the framework, and your excellent notebook need some refactoring for the latest version of the fastai.

Here is how I could import custom Pytorch models into fastai v1.39 if you are interested:

Yeah, fastai has changed a lot and has a much cleaner structure now. Many things would probably need to be changed. Unfortunately, I don’t have the required bandwidth to refactor at the moment.

Hey there,
regarding your first concern “how to remove the last fully connected layer from a CNN in fastai to stack more than one CNN into a FC-NN”, have you found any solution?
I am going to create two resnet34 bodies and then, concatenate those into a linear layer results in classification (same as your concern). I was wondering to implement it in fastai and highly appreciated to advice me.