In the last few weeks I’ve been playing around with STN: Spatial Transformer Networks (https://arxiv.org/abs/1506.02025) and ICSTN (https://arxiv.org/abs/1612.03897). I was interested in the idea of having multiple transformations from the image, to do some kind of unsupervised localization, as in the table 3 of the STN paper (see below). I think STN is a nice idea, but from all the experiments I’ve been doing, I am getting the feeling that they just don’t work very well.
So, I will like to know if anyone else have been playing around with STNs and what are your opinions.
Thanks for sharing your experience. I have just started working on STN, but I find the performance is no better. How did your experimentation go? Did the performance improve for you over time?
I am still working on this, will keep you posted if I find some improvements.
Performance did not improve, I feel it requires many specific parameters in order to work for a certain dataset. Like the learning rate for the STN and the classification network (this is even worse in the case of the ICSTN). If you are interested in similar works, I recommend you to check https://arxiv.org/abs/1901.09891 and https://ieeexplore.ieee.org/document/8099959.
In the end, if what you care the most is classification accuracy then I don’t think these methods are the way to go (you can easily get better results with resnext-wsl). I am still looking into the problem, if I find something I will share here.
Hello ! I am also experimenting with the STN. Did you guys tried to add the STN module into a segmentation model like the Unet ?