Hello,
In Identity Mappings in Deep Residual Networks, He et al. investigate the importance of residual summations in ResNet and prove that “clean” identity branches are imperative for success. Therefore, they suggest a series of pre-activation networks that abide by a batch normalization → ReLU → convolution pattern and, regarding your question, apply no operations after the residual summation. The result is a family of more performant models that outperform plain ResNets and can be trained at vastly greater depths. Additionally, state-of-the-art architectures like EfficientNet and ConvNeXt also do not perform activation functions after the residual summation, further demonstrating the merit of information flowing through each block unchanged.