There are a lot of posts of people doing complex network surgery to make cool networks like deeplabv3 or pspnet. I was frustrated with how difficult it was for me to understand what was going on. I realized that most people just copy the resnet code form torchvision and modify it from there. But even looking at the raw torchvision code can be really daunting because there are a lot of functions which are quite mysterious without tracing through their behavior for each different resnet type. (You even need to consider hypothetical resnet architectures for the code to really make sense.)
Anyway, it took the time to understand what was going on and found a lot things that were really confusing to me. So I wrote a little medium article that tries to clear things up and hopefully speed up anyone else wanting to explore the code: https://medium.com/@erikgaas/resnet-torchvision-bottlenecks-and-layers-not-as-they-seem-145620f93096