Looking at torchvision resnet code

There are a lot of posts of people doing complex network surgery to make cool networks like deeplabv3 or pspnet. I was frustrated with how difficult it was for me to understand what was going on. I realized that most people just copy the resnet code form torchvision and modify it from there. But even looking at the raw torchvision code can be really daunting because there are a lot of functions which are quite mysterious without tracing through their behavior for each different resnet type. (You even need to consider hypothetical resnet architectures for the code to really make sense.)

Anyway, it took the time to understand what was going on and found a lot things that were really confusing to me. So I wrote a little medium article that tries to clear things up and hopefully speed up anyone else wanting to explore the code: https://medium.com/@erikgaas/resnet-torchvision-bottlenecks-and-layers-not-as-they-seem-145620f93096

1 Like

Yeah, the best way to understand ResNet structure is referring to the original paper. It shows all these blocks and describes the architecture without any extra levels of abstractions.

1 Like

Totally agree. The original paper is really good. It’s interesting though how the implementations vary. It was more arbitrary than I thought initially. For instance, PyTorch’s resnet50 uses bottlenecks in its implementation while Keras’ resnet50 has the definition needed for bottleneck, but ultimately doesn’t use them.