ResNet-18 vs -20, or ResNet-50 vs -56

I’ve always been curious what the difference is between the 18- and 20-layer resnet (or between the 50- and 56-layer resnet). The original Residual Neural Networks paper actually references both variants, but as far as I can tell there’s no explanation.

EDIT: I also noticed that ResNet-20 is used for CIFAR images, which are 32x32 in size, so obviously they can’t afford too many pooling layers. Perhaps that impacts the architecture choice. My guess is that the difference is something like

xresnet18 = XResNet(ResBlock, expansion=1, layers=[2, 2, 2, 2])
xresnet20 = XResNet(ResBlock, expansion=1, layers=[4, 5])

(minus the anachronistic “bag of tricks” enhancements using XResNet.)

Does anybody know for sure?