Network deconvolution - CNNs that are more robust and easier to train

jwuphysics · May 4, 2020, 7:31pm

I recently read the ICLR paper on network deconvolutions (Ye et al. 2020) (not to be confused with “deconvolutional” neural networks which actually use transposed convolutions; e.g., Zeiler et al. 2010). The upshot of these deconvolutions is that they remove correlations between pixels and channels, which allow for a more sparse/efficient representation of features via convolutional layers. There’s also a bit of biological intuition for why this might be useful. The authors demonstrate empirically that deconvolutions speed up training and give better results overall:

Anyway, the authors discuss approximate computation of the deconvolution matrix, and have extended some Pytorch models using this implementation in their Github repo. If I have time later this week, I may try some experiments. But I figured others may be interested in having a look, and perhaps try using them for the Imagenette, FastGarden, and/or Animal vocalization challenges!

jwuphysics · May 4, 2020, 9:29pm

I’ve extended the fastai2 xresnet implementation using the official deconvolution repo linked above. Here, I’m only substituting FastDeconv layers in the stem of the model. This seems to create a really flexible (and quick to train) model!

gist.github.com

https://gist.github.com/jwuphysics/eb388eb3dcee4ccac84f8174a6915bc6

xresnet_deconv.py

"""Combining fastai2 xresnet with deconv stem 
Hosted here: https://gist.github.com/jwuphysics/eb388eb3dcee4ccac84f8174a6915bc6

https://github.com/fastai/fastai2/blob/master/fastai2/vision/models/xresnet.py
https://github.com/yechengxi/deconvolution/blob/master/models/resnet.py
"""

from fastai2.vision.all import *
from deconvolution.models.deconv import FastDeconv # from `deconvolution` repo

This file has been truncated. show original

Currently I’m running a few tests on my own hyperspectral astronomical data, and so far it looks like the loss drops extremely quickly. I’ve upped the weight decay (mentioned in the paper too) to about 1e-2 and might even try higher. In any event, early results are promising. I’d love to see it applied to other tasks!

EDIT: another interesting result is that the learning rate finder seems to not work as well…

Maybe this is consistent with results from the paper suggesting that you can train at very high learning rates (lr = 1.0 with SGD at least).

MicPie · May 5, 2020, 8:51am

It looks different, but maybe in a good way?

You can also change the start and end lr, see here: https://dev.fast.ai/callback.schedule#Learner.lr_find

Absolute loss values depend on your setup (e.g., loss func, number of classes, specific pretaining, etc.). Did you used a standard setup (model & data)?

Thank you for sharing this publication - looks very interesting!

jwuphysics · May 5, 2020, 9:50am

Oh, it’s definitely a non-standard CNN task! I’m optimizing the RMSE loss function for multivariate regression using 5-channel images of galaxies The model is an adapted version of the xresnet18 architecture with deconvolution layers substituted instead of Conv2d + batchnorm.

balnazzar · May 5, 2020, 10:33am

Thanks John. You posts are always interesting.

morgan · May 8, 2020, 4:07pm

Their 5minute presentation video at ICLR is also worth a look to help you get a quick overview!

https://iclr.cc/virtual/poster_rkeu30EtvS.html

MicPie · May 9, 2020, 6:53am

The link seems to not work anymore.

jwuphysics · May 9, 2020, 4:12pm

Try this link: https://iclr.cc/virtual_2020/poster_rkeu30EtvS.html

Or search under Thursday’s talks (I believe in the network architectures section).

pcuenq · May 11, 2020, 7:41pm

This sounds very interesting! I haven’t had the chance to play with your code yet, but a quick read seems to suggest that pretrained would be ignored if set to True. Looking at the code from the paper authors, however, it appears that they are able to successfully load pretrained weights for the base architectures. Did you experiment with transfer learning at all, or are you training your models from scratch?

jwuphysics · May 12, 2020, 11:52am

I actually implemented a custom version where deconvolution layers are substituted only in the stem of a CNN. In my tests (see the FastGarden thread), these hybrid deconvolution nets work better than regular CNNs or fully deconvolution networks. No such pertained networks are available for the hybrid versions.

nchukaobah · May 19, 2020, 4:39pm

This looks interesting. Do you have a notebook I can look at for your results?

dmangla3 · September 6, 2020, 2:47pm

I also created a post for this paper. I have also shared a notebook in the post - Network deconvolution - Train neural networks faster