Support dilated convolutions in xresnet

I just gave it a go and it works pretty well! 79.074 top-1 w/ the original ResNet50-D, no dilation. With my hack and output_stride=8, 79.264. With output_stride=8 and just weight compatibility via Identity blocks, it’s down to 73 but still works.

The logic I have for building the avg pool part of the shortcut is below… I quickly hacked together an AvgPool2dSame layer based on a Conv2dSame impl I had for Tensorflow weight compat for other networks. That fixes the issue with avg pool cause dimension mismatches when you set stride=1 while kernel is 2.

if avg_down:
  avg_stride = stride if dilation == 1 else 1
  conv_stride = 1
  if stride == 1 and dilation == 1:
      downsample_layers += [nn.Identity()]
  else:
      downsample_layers += [AvgPool2dSame(2, avg_stride, ceil_mode=True, count_include_pad=False)]

Unfortunately, all this is getting a bit messy, I don’t think it’s going to meet the ‘keep it simple’ aesthetic of xresnet.

3 Likes

Maybe there are pieces that can be refactored into supporting functions?

I agree with Ross, the changes in ConvLayer and ResBlock are minimal enough so they can be merged, but this changes too much the XResNet. I think we can have something more simple by:

  1. having a _make_blocks function in XResNet that returns the list of blocks
  2. subclass XResNet for this use case and write custom _make_blocks and _make_layer.
1 Like

Ah okay I get it, so we still want to do pooling but set the stride to 1 instead of 2 in the idpath. I will implement that today and update the examples.

I agree, xresnet shouldn’t have unnecessary complexity for this special case. Supporting different pooling strides in the idpath of Resblock is okay?
Otherwise resblock would also have to be subclassed

Yup, the assymetric padding part is a bit trickier though. The avg pool requires [0, 1, 0, 1] padding in the case where stride==1. I used a custom AvgPool that calculates the padding dynamically from the params + input size, but looks like it does not depend on input, just a fixed assymetric padding for the stride=1 case, and symmetric 0 padding for the stride = 2 case.

@sgugger @jeremy @muellerzr @rwightman Sorry for the delay, was really busy the last couple days.

I took Sylvans feedback and refactored xresnet to have a _make_blocks function and also implemented correct stride 1 pooling with same padding, as Ross suggested. I now get the expected results, ie. a classification model trained without dilation, will get similar accuracy when inference is done with dilation (at least when the output stride doesn’t differ too much from the one during training).

Weights are now also compatible between the different output_strides without any Identity layers or hacks.

The implementation and examples are in this notebook on collab.
I also did initial experiments on camvid, but dilation doesn’t seem to help but rather hurt performance with the unet (I used same number of epochs and learning rate for all runs).

Let me know if this or parts of it should be submitted as a PR.

Cheers, Johannes

5 Likes

The handling of the strides/avgpool looks good.

One thing I noticed, the padding calc, the first ks should be stride no? (((ks-1) + (ks-1)*(dilation-1)) // 2

Ross

Thanks for catching that, I fixed it in the notebook.

Edit: after changing it I get shape errors, so I thing the original one was correct. I’ll double check tomorrow with reference implementations.

After our book deadline on Feb 10 feel free to ping me about this project - I won’t be able to look until then.

Hi @jeremy Hope the book deadline went well and wasn’t too stressfull!

I am currently waiting for the next lesson of @muellerzr about retinanet to experiment with dilations in the backbone there, the notebook with the current implementation & experiments is here

Any other things I can try come to mind?

@j.laute the notebook we’ll be working out of is here:

IIRC someone had figured out a bit on the inference and mAP score but otherwise the model will train :slight_smile: (and the notebook is complete)

@sgugger experiements with object detections are taking longer than expected, can I open a PR for the XResNet refactor you suggested (add a make_blocks function to XResNet) in the meantime?

Code change is minimal, but then I dont have to include the refactored version everywhere to subclass it for dilation support.
Also this might come in handy for other variations of XResNet.

You can certainly do that.

Is this still on the way, I am interested to help.

You can check out the notebook linked above, and try to use the dilated xresnet on some problems and report here if you encounter any issues or have any questions.
The code that calculates the strides and dilations is currently pretty ugly, so that’s one thing that needs to be refactored.

I am currently in exam preparation, so I don’t have much time for fastai :confused:
I hope to finally finish this when my exams are over (in 3 weeks).

hope you’re well and stay safe!

Have been playing with dilated convs lately, and for me they are super slow.
Any idea why?
I enabled cudnn.benchmark btw.

It appears to be linked to torch.backends.cudnn.deterministic=True

I’m finally done with my exams. I the next days I will try to prepare PRs to bring this functionality into fastai.
One thing that would be nice is to have an example where the use of dilated convolutions in a pretrained xresnet actually outperforms the standard version, I thought object detection might be a good candidate for that. ( but I’ll have to catch up first with what has happened with the library in the last ~3 months).

Regarding why dilated convolutions where slower for you, I’m not sure, cudnn might choose a different algorithm depending the size of your tensors and dilation rate etc.

I don’t think too much has changed, I have an example Kaggle kernel here: https://www.kaggle.com/muellerzr/fastai2-starter-kernel (for object detection)