Spatial Transformer Networks (

Spatial transformer networks are a generalization of differentiable attention to any spatial transformation. Spatial transformer networks (STN for short) allow a neural network to learn how to perform spatial transformations on the input image in order to enhance the geometric invariance of the model. For example, it can crop a region of interest, scale and correct the orientation of an image. It can be a useful mechanism because CNNs are not invariant to rotation and scale and more general affine transformations.



what is affine_grid & grid_sample ? I took it from the URL(, I understood all the other implementation except the two line highlighted in yellow. I went through the paper, Still I unable to fully convince myself in terms of understanding.

I’ll move this to #theory

So Far what I got about affine transformations.

The figure below shows an example of what we mean. On the left, a cylinder has been built
in a convenient place, and to a convenient size. Because of the requirements of a scene,
it is first scaled to be longer and thinner than its original design, rotated to a desired
orientation in space, and then moved to a desired position (i.e. translated). The set of
operations providing for all such transformations are known as the affine transforms. The
affines include translations and all linear transformations, like scale, rotate, and shear.


Src :

Still, I need to understand the code which I highlighted. Is it a simple matrix multiplication or something else?

Pix2Code seems good for my start, They implemented in KERAS, I am about to start in pyTorch. My basic goal is to convert the basic hand-drawn shapes to abstract shape.