Concatenating outputs of two networks for transfer learning

I’ve come across a baseline model for a satellite image damage classification challenge where the network is a resnet50 pre-trained model along with a small convolutional network like so:

resnet50            small 3 layer convnet
         \                        /
           concat-ed outputs
                    | 
            a few dense layers 
                    |
              final output

Can someone explain the intuition behind this and if this is a common approach to use while doing transfer learning? Thanks!

I think a possible reason that they did it this way, instead of just having the resnet50 then some extra layers, is that this way the non-tuned layers have direct access to the image.

Loads of details might be relevant for “satellite image damage classification” that the pre-trained model doesn’t really recognise (I presume that the pre-trained model was trained on some other kind of dataset). Therefore if we only used the output or later layers of the resnet50 as input to the new layers some of the information that might be useful is not really accessible.

1 Like

That makes sense. Thanks for the explanation!

1 Like