Hey @davecg That’s a pretty darn good insight!! This is something that I thought of too, but did not know how to do it. Thought about 3D CNN, but that’s not good, since the lateral and the frontal views are not exactly related by depth, although somehow they are related (the up here is like the up there). I thought also if it can be stacked using R: frontal , B: Lateral, G: average of both. But this is not ideal too. Most of imageNet images’ RGB channels are not differ in structure too much like these 2 views. So your idea seems the best to deal with this spatial thingy
That’s another great idea. Can you share the keras code? One day perhaps, I will port it to fastai/pytorch after learning fastai part2.
@TheShadow29 You are keeping to amaze me with your pytorch skills.
So what will be the difference between your first code when you made one big model combining the two CNN into one FC NN, compared to doing it like in the paper separately?
Why they have done it separately?
Do you agree that if training is possible in one big combined model (of course if GPU memory allows it), then it is better to avoid training separate models? Results will be the same , but the combined model will be more efficient?
Let’s say we have a big stadium. We put microphones distributed evenly in the area of the stadium. The spectrogram of each microphone output is an image of sound (frequency x time) . Of course,each microphone is related with all the others in such a way that, large volume of sounds are picked up by roughly all microphones. However small group chit-chats are only detected by nearby microphones.
So the microphone spectrograms for 1 sec will be a square image that is related somehow with the others especially nearby microphones.
So instead of the xray model where we have only to add a convolutional layer with input of (2x7) to capture the spatial information of roughly relation of the two views, here we should add 3d-CNN to capture the 2D spatial information of the microphone distribution on the stadium floor? If we have 6 x 8 microphones, the 3d-CNN input should be like 6 x 8 x 7, right? What do you think?
Its basically saying that the model you loaded and the new model don’t share the same parameters, so can’t be directly loaded. when you do learn.load, internally it loads a state dictionary. The “num.bias” that you are seeing is the key and the value is the parameter value. You could do learn.model.load_state_dict(torch.load(path_to_file), strict=False). If that doesn’t work, you will have to create a new dictionary, and remove the keys that are different, and load only the subset of keys which are same.
@TheShadow29 I kept your notebook in my browser tab, to reproduce it with fastai v1 after finishing the course, but seems a lot of changes happened to the framework, and your excellent notebook need some refactoring for the latest version of the fastai.
Yeah, fastai has changed a lot and has a much cleaner structure now. Many things would probably need to be changed. Unfortunately, I don’t have the required bandwidth to refactor at the moment.
@hwasiti
Hey there,
regarding your first concern “how to remove the last fully connected layer from a CNN in fastai to stack more than one CNN into a FC-NN”, have you found any solution?
I am going to create two resnet34 bodies and then, concatenate those into a linear layer results in classification (same as your concern). I was wondering to implement it in fastai and highly appreciated to advice me.
Best,