What is the role of skip connections in U-Net?

I was able to find that the skip connections used in U-Net help to recover fine grained details in the prediction, however I do not understand what is meant by this. Besides, I was wondering what would happen if the U-Net does not include skip connections. Would you need to change the architecture?

1 Like

I would say that skip connections in U-Net play a similar role they play in other architectures with skip connections, like ResNets. It helps to propagate the information from the earlier layers to the deeper ones, and helps to alleviate the vanishing gradients’ problem by directly passing the information through the network.


At least, that is my intuitive understanding. Here is what the authors of the U-Net paper say.

Please correct me if I’m wrong, or if there is a better explanation.



To add to Ilia’s answer, say the task at hand is image colourization, with the input being the photo below, obtained from here.

Here is the coloured version.

If you train a U-Net without skip connections and ask it to colourize this grayscale picture, its overall prediction would not be too inaccurate, that is, it would have a general idea that the ground is grass and thus should be green, the sky is supposed to be blue, there is a horse in the centre, etc. However, it would greatly struggle with generating some of the smaller details. For instance, it would most probably miss the brown patches of dirt in the grass and erroneously output green for them. Another example would be the bottom-left corner. Notice how crisp it is; in fact, we can perceive individual blades of grass. A U-Net without skip connections though would output a blurry mess, and it would be impossible to discern distinct blades of grass.

Bear in mind that this is not surprising. There is only so much information the bottleneck in a U-Net can store, and it naturally prioritizes salient parts of the input, such as the horse, over minute details, like each individual blade of grass. Skip connections alleviate this problem by bypassing the bottleneck and providing the model with the encoder’s intermediate activations, which contain such fine-grained, high-resolution details. That is what is meant by “recover fine grained details in the prediction.”

No, other than removing the skip connections, further modifications are not necessary.

Please don’t hesitate to reach out if you have other questions.


Btw, it could be interesting to conduct such an experiment if you would like, i.e., try to disable skip connection and see where it goes. It would be similar to what Jeremy does in his and Sylvain’s book. They go step by step in building a convolutions-based vision model, adding more and more features, and moving it from an unstable, inaccurate one into state of the art. It really helps to see in practice why we add all these features and tricks.