How to remove the last fully connected layer from a CNN in fastai to stack more than one CNN into a FC-NN?

wyquek · October 22, 2018, 2:59pm

btw do you have the dataset? if u do, let’s build one

davecg · October 22, 2018, 3:32pm

You may also want to reconsider doing a simple concat - you’re throwing away spatial information.

Something at the top of the frontal image will roughly correspond to something at the top of the lateral image. It might be better to concatenation along a new dimension and doing a 2D convolution rather than a normal concat and a fully connected layer.

You should also consider sharing weights between the frontal and lateral networks, at least for early layers, since they are processing similar data.

I did something like this for CC and MLO mammograms in Keras, don’t have working pytorch code for it yet.

wyquek · October 22, 2018, 3:46pm

what you say makes a lot of sense. by concatenating the two along a new dimension do you mean we will have a activation size of (2, bs, 7,7)?

TheShadow29 · October 22, 2018, 4:27pm

I didn’t note it was done separately. In that case it is best to save the features to disk. So you could do something like

out = model1(inp)
save_output(out, fnames)

And then use these features to train the final fc layers. So in a separate file
learn.model = nn.Sequential(nn.Linear(*args), nn.ReLU(), nn.Linear(*args2))

If you need a slightly inefficient way then taking from the earlier example:

class NewModel(nn.Module):
    def __init__(self, m1, m2):
        self.m1 = m1
        self.m2 = m2
        self.lin1 = nn.Linear(args)  #args are the required arguments
    def forward(self, inp):
        out1 = self.m1(inp)
        out2 = self.m2(inp)
        out1 = out1.detach()
        out2 = out2.detach()
        out3 = torch.cat([out1, out2], dim=-1)
        out4 = self.lin1(out3)
        return out4

detach would prevent backward flow. However, do note you are doing the forward pass multiple times, so best to save it to a disk. However, that would require you to redefine your Dataset, especially the __getitem__ function.

There are some efficient ways to do that. I think in previous versions fastai used bcolz which might be worth exploring for fast reading of numpy arrays. A not so good version of doing this is https://github.com/TheShadow29/FAI-notes/blob/master/notebooks/Using-Forward-Hook-To-Save-Features.ipynb (especially the last part of saving and loading the features).
Do note that with new pytorch and fastai versions, some of it is outdated, so might not work out of the box.

Let me know if I am answering your question.

davecg · October 22, 2018, 5:56pm

You would pool along the horizontal dimension to get a dimension of 1 horizontal, 7 vertical or whatever the output size of your network is.

You would then add a convolutional layer or two on what would now be 2x7 or 7x2 depending on your image orientation. (Would obviously not be able to use a 3x3 kernel.)

wyquek · October 25, 2018, 4:40am

Just curious, when you did your data-augmentation for the lateral images, do you perform anything special other than the usual transforms_side_on ? For example, in fastai normal images we do transforms_side_on, saterlite images we do transforms_top_down…

transforms_basic    = [RandomRotate(10), RandomLighting(0.05, 0.05)]
transforms_side_on  = transforms_basic + [RandomFlip()]
transforms_top_down = transforms_basic + [RandomDihedral()]

davecg · October 25, 2018, 2:42pm

If you are trying to find relationships along the shared dimension, you can’t randomly flip it in your data augmentation.

Ie no vertical flips for PA and lateral chest X-rays.

jamesp · October 25, 2018, 3:22pm

It seems like the more optimal approach would be to jointly model these images. Flipping would then be OK, so long as if one is flipped then the other is also flipped. That is, it would preserve the spatial co-dependence. However, it is much easier for me to say offhand that than to actually produce a model reflecting that viewpoint.

hwasiti · October 25, 2018, 6:40pm

I don’t have the dataset, but reading that paper made me curious how such models can be built using fastai. And this is not the first time I see such models. A lot of kaggle comp winners used such complex models. I don’t remember where though. But this is something that intrigued me for so long…

hwasiti · October 25, 2018, 6:40pm

Hey @davecg That’s a pretty darn good insight!! This is something that I thought of too, but did not know how to do it. Thought about 3D CNN, but that’s not good, since the lateral and the frontal views are not exactly related by depth, although somehow they are related (the up here is like the up there). I thought also if it can be stacked using R: frontal , B: Lateral, G: average of both. But this is not ideal too. Most of imageNet images’ RGB channels are not differ in structure too much like these 2 views. So your idea seems the best to deal with this spatial thingy

That’s another great idea. Can you share the keras code? One day perhaps, I will port it to fastai/pytorch after learning fastai part2.

hwasiti · October 25, 2018, 6:48pm

@TheShadow29 You are keeping to amaze me with your pytorch skills.

So what will be the difference between your first code when you made one big model combining the two CNN into one FC NN, compared to doing it like in the paper separately?
Why they have done it separately?

Do you agree that if training is possible in one big combined model (of course if GPU memory allows it), then it is better to avoid training separate models? Results will be the same , but the combined model will be more efficient?

hwasiti · October 25, 2018, 7:16pm

@davecg How about this scenario

Let’s say we have a big stadium. We put microphones distributed evenly in the area of the stadium. The spectrogram of each microphone output is an image of sound (frequency x time) . Of course,each microphone is related with all the others in such a way that, large volume of sounds are picked up by roughly all microphones. However small group chit-chats are only detected by nearby microphones.

So the microphone spectrograms for 1 sec will be a square image that is related somehow with the others especially nearby microphones.

So instead of the xray model where we have only to add a convolutional layer with input of (2x7) to capture the spatial information of roughly relation of the two views, here we should add 3d-CNN to capture the 2D spatial information of the microphone distribution on the stadium floor? If we have 6 x 8 microphones, the 3d-CNN input should be like 6 x 8 x 7, right? What do you think?

This is an interesting project that I remembered using micrphones in a forest:
The fight against illegal deforestation with TensorFlow

I think Jeremy mentioned that Sara Hooker was involved in this project which is one of the best fastai students. Here is her talk about the project

Thank you guys for the amazing insight and discussions!

TheShadow29 · October 25, 2018, 10:38pm

Bigger models are often difficult to train without much gain in performance. Also they had separate datasets so makes sense to train them separately

pete1 · December 25, 2018, 12:33am

Will this solution copy existing weights to the model with the new custom head? I suspect not, unfortunately. Is there any way to do that?

TheShadow29 · December 25, 2018, 4:27am

I think it would.

pete1 · December 25, 2018, 6:22am

You’re right - I was able to print them. But I have another problem that you might be able to answer. I:

train a conversion model with several classes in its dataset labels.
change its model head to two output classes (now inconsistent with its dataset).
save this new model with learner.save()
create a new learner with only two classes in its dataset.
read the model created in 3 into this new learner (now consistent with its dataset).

I get the error messages when reading into the new learner:
Missing key(s) in state_dict: “8.2.weight”, “8.2.bias”, “8.2.running_mean”, “8.2.running_var”, “8.4.weight”, “8.4.bias”, “8.6.weight”, “8.6.bias”, “8.6.running_mean”, “8.6.running_var”, “8.8.weight”, “8.8.bias”.
Unexpected key(s) in state_dict: “10.weight”, “10.bias”, “10.running_mean”, “10.running_var”, “10.num_batches_tracked”, “12.weight”, “12.bias”, “14.weight”, “14.bias”, “14.running_mean”, “14.running_var”, “14.num_batches_tracked”, “18.weight”, “18.bias”, “18.running_mean”, “18.running_var”, “18.num_batches_tracked”, “20.weight”, “20.bias”, “22.weight”, “22.bias”, “22.running_mean”, “22.running_var”, “22.num_batches_tracked”, “24.weight”, “24.bias”.

Do you know what these errors mean?

TheShadow29 · December 25, 2018, 9:42pm

Its basically saying that the model you loaded and the new model don’t share the same parameters, so can’t be directly loaded. when you do learn.load, internally it loads a state dictionary. The “num.bias” that you are seeing is the key and the value is the parameter value. You could do learn.model.load_state_dict(torch.load(path_to_file), strict=False). If that doesn’t work, you will have to create a new dictionary, and remove the keys that are different, and load only the subset of keys which are same.

hwasiti · January 13, 2019, 11:34pm

@TheShadow29 I kept your notebook in my browser tab, to reproduce it with fastai v1 after finishing the course, but seems a lot of changes happened to the framework, and your excellent notebook need some refactoring for the latest version of the fastai.

Here is how I could import custom Pytorch models into fastai v1.39 if you are interested:
https://forums.fast.ai/t/lesson-5-advanced-discussion/30865/40?u=hwasiti

TheShadow29 · January 14, 2019, 9:56pm

Yeah, fastai has changed a lot and has a much cleaner structure now. Many things would probably need to be changed. Unfortunately, I don’t have the required bandwidth to refactor at the moment.

ghasem.abdi · June 6, 2020, 7:18am

@hwasiti
Hey there,
regarding your first concern “how to remove the last fully connected layer from a CNN in fastai to stack more than one CNN into a FC-NN”, have you found any solution?
I am going to create two resnet34 bodies and then, concatenate those into a linear layer results in classification (same as your concern). I was wondering to implement it in fastai and highly appreciated to advice me.
Best,