Uber releases new Conv architechture


I saw Uber released blog post on some research they have been doing with convolution layers. I thought people here would find it super interesting. Here is the post:

Here is the post.

Basically, they found that Convolutions were pretty bad at pinpointing things in an image so they added feature maps that contain the i,j coordinates of each pixel in the map and concatenated it to the convolution output. This allows the convolution layer to learn a translation dependence.

Image of architechture from post.


In particular they achieved a >20% IOU gain in their object detection example problem (Faster R-CNN on shifted MNIST digits). Would love to see this implemented in fast.ai and see whether we can reproduce a similar gain when applied to DL2 lesson 1.