I’ve been working on some segmentation projects. In trying to maximize performance, I’ve looked into some different U-Net architectures. Here are two I found interesting.
https://openreview.net/pdf?id=SyQtAooiz
This paper replaces pooling and upsampling with pixel shuffle like operations and uses a CycleGan-like framework for training the model. I tried training a model with a pretrained resnet backbone followed by upsampling using pixel shuffle. It performed approximately the same as the transpose conv model used in lesson 14. At some point I want to try train a full model from scratch using a pixel shuffle type operation to downsample rather than convolutions with a stride of 2, but I haven’t gotten around to that yet.
This paper replaces pooling and upsampling with wavelet wizardry that I’m still trying to wrap my head around.