How to remove the last fully connected layer from a CNN in fastai to stack more than one CNN into a FC-NN?

btw do you have the dataset? if u do, let’s build one

You may also want to reconsider doing a simple concat - you’re throwing away spatial information.

Something at the top of the frontal image will roughly correspond to something at the top of the lateral image. It might be better to concatenation along a new dimension and doing a 2D convolution rather than a normal concat and a fully connected layer.

You should also consider sharing weights between the frontal and lateral networks, at least for early layers, since they are processing similar data.

I did something like this for CC and MLO mammograms in Keras, don’t have working pytorch code for it yet.

2 Likes

what you say makes a lot of sense. by concatenating the two along a new dimension do you mean we will have a activation size of (2, bs, 7,7)?

I didn’t note it was done separately. In that case it is best to save the features to disk. So you could do something like

out = model1(inp)
save_output(out, fnames)

And then use these features to train the final fc layers. So in a separate file
learn.model = nn.Sequential(nn.Linear(*args), nn.ReLU(), nn.Linear(*args2))

If you need a slightly inefficient way then taking from the earlier example:

class NewModel(nn.Module):
    def __init__(self, m1, m2):
        self.m1 = m1
        self.m2 = m2
        self.lin1 = nn.Linear(args)  #args are the required arguments
    def forward(self, inp):
        out1 = self.m1(inp)
        out2 = self.m2(inp)
        out1 = out1.detach()
        out2 = out2.detach()
        out3 = torch.cat([out1, out2], dim=-1)
        out4 = self.lin1(out3)
        return out4

detach would prevent backward flow. However, do note you are doing the forward pass multiple times, so best to save it to a disk. However, that would require you to redefine your Dataset, especially the __getitem__ function.

There are some efficient ways to do that. I think in previous versions fastai used bcolz which might be worth exploring for fast reading of numpy arrays. A not so good version of doing this is https://github.com/TheShadow29/FAI-notes/blob/master/notebooks/Using-Forward-Hook-To-Save-Features.ipynb (especially the last part of saving and loading the features).
Do note that with new pytorch and fastai versions, some of it is outdated, so might not work out of the box.

Let me know if I am answering your question.

1 Like

You would pool along the horizontal dimension to get a dimension of 1 horizontal, 7 vertical or whatever the output size of your network is.

You would then add a convolutional layer or two on what would now be 2x7 or 7x2 depending on your image orientation. (Would obviously not be able to use a 3x3 kernel.)

1 Like

Just curious, when you did your data-augmentation for the lateral images, do you perform anything special other than the usual transforms_side_on ? For example, in fastai normal images we do transforms_side_on, saterlite images we do transforms_top_down

transforms_basic    = [RandomRotate(10), RandomLighting(0.05, 0.05)]
transforms_side_on  = transforms_basic + [RandomFlip()]
transforms_top_down = transforms_basic + [RandomDihedral()]

If you are trying to find relationships along the shared dimension, you can’t randomly flip it in your data augmentation.

Ie no vertical flips for PA and lateral chest X-rays.

2 Likes

It seems like the more optimal approach would be to jointly model these images. Flipping would then be OK, so long as if one is flipped then the other is also flipped. That is, it would preserve the spatial co-dependence. However, it is much easier for me to say offhand that than to actually produce a model reflecting that viewpoint.

1 Like

I don’t have the dataset, but reading that paper made me curious how such models can be built using fastai. And this is not the first time I see such models. A lot of kaggle comp winners used such complex models. I don’t remember where though. But this is something that intrigued me for so long…

Hey @davecg That’s a pretty darn good insight!! This is something that I thought of too, but did not know how to do it. Thought about 3D CNN, but that’s not good, since the lateral and the frontal views are not exactly related by depth, although somehow they are related (the up here is like the up there). I thought also if it can be stacked using R: frontal , B: Lateral, G: average of both. But this is not ideal too. Most of imageNet images’ RGB channels are not differ in structure too much like these 2 views. So your idea seems the best to deal with this spatial thingy :slight_smile:

That’s another great idea. Can you share the keras code? One day perhaps, I will port it to fastai/pytorch after learning fastai part2.

@TheShadow29 You are keeping to amaze me with your pytorch skills.

So what will be the difference between your first code when you made one big model combining the two CNN into one FC NN, compared to doing it like in the paper separately?
Why they have done it separately?

Do you agree that if training is possible in one big combined model (of course if GPU memory allows it), then it is better to avoid training separate models? Results will be the same , but the combined model will be more efficient?

@davecg How about this scenario

Let’s say we have a big stadium. We put microphones distributed evenly in the area of the stadium. The spectrogram of each microphone output is an image of sound (frequency x time) . Of course,each microphone is related with all the others in such a way that, large volume of sounds are picked up by roughly all microphones. However small group chit-chats are only detected by nearby microphones.

So the microphone spectrograms for 1 sec will be a square image that is related somehow with the others especially nearby microphones.

So instead of the xray model where we have only to add a convolutional layer with input of (2x7) to capture the spatial information of roughly relation of the two views, here we should add 3d-CNN to capture the 2D spatial information of the microphone distribution on the stadium floor? If we have 6 x 8 microphones, the 3d-CNN input should be like 6 x 8 x 7, right? What do you think?

This is an interesting project that I remembered using micrphones in a forest:
The fight against illegal deforestation with TensorFlow

I think Jeremy mentioned that Sara Hooker was involved in this project which is one of the best fastai students. Here is her talk about the project

Thank you guys for the amazing insight and discussions!

Bigger models are often difficult to train without much gain in performance. Also they had separate datasets so makes sense to train them separately

1 Like

Will this solution copy existing weights to the model with the new custom head? I suspect not, unfortunately. Is there any way to do that?

I think it would.

You’re right - I was able to print them. But I have another problem that you might be able to answer. I:

  1. train a conversion model with several classes in its dataset labels.
  2. change its model head to two output classes (now inconsistent with its dataset).
  3. save this new model with learner.save()
  4. create a new learner with only two classes in its dataset.
  5. read the model created in 3 into this new learner (now consistent with its dataset).

I get the error messages when reading into the new learner:
Missing key(s) in state_dict: “8.2.weight”, “8.2.bias”, “8.2.running_mean”, “8.2.running_var”, “8.4.weight”, “8.4.bias”, “8.6.weight”, “8.6.bias”, “8.6.running_mean”, “8.6.running_var”, “8.8.weight”, “8.8.bias”.
Unexpected key(s) in state_dict: “10.weight”, “10.bias”, “10.running_mean”, “10.running_var”, “10.num_batches_tracked”, “12.weight”, “12.bias”, “14.weight”, “14.bias”, “14.running_mean”, “14.running_var”, “14.num_batches_tracked”, “18.weight”, “18.bias”, “18.running_mean”, “18.running_var”, “18.num_batches_tracked”, “20.weight”, “20.bias”, “22.weight”, “22.bias”, “22.running_mean”, “22.running_var”, “22.num_batches_tracked”, “24.weight”, “24.bias”.

Do you know what these errors mean?

Its basically saying that the model you loaded and the new model don’t share the same parameters, so can’t be directly loaded. when you do learn.load, internally it loads a state dictionary. The “num.bias” that you are seeing is the key and the value is the parameter value. You could do learn.model.load_state_dict(torch.load(path_to_file), strict=False). If that doesn’t work, you will have to create a new dictionary, and remove the keys that are different, and load only the subset of keys which are same.

@TheShadow29 I kept your notebook in my browser tab, to reproduce it with fastai v1 after finishing the course, but seems a lot of changes happened to the framework, and your excellent notebook need some refactoring for the latest version of the fastai.

Here is how I could import custom Pytorch models into fastai v1.39 if you are interested:
https://forums.fast.ai/t/lesson-5-advanced-discussion/30865/40?u=hwasiti

Yeah, fastai has changed a lot and has a much cleaner structure now. Many things would probably need to be changed. Unfortunately, I don’t have the required bandwidth to refactor at the moment.

@hwasiti
Hey there,
regarding your first concern “how to remove the last fully connected layer from a CNN in fastai to stack more than one CNN into a FC-NN”, have you found any solution?
I am going to create two resnet34 bodies and then, concatenate those into a linear layer results in classification (same as your concern). I was wondering to implement it in fastai and highly appreciated to advice me.
Best,