Interesting new paper that uses periodic activation functions (sine) to learn representations of complex signals that are both capable of representing fine detail and preserve spatial and temporal derivatives.
Project page: https://vsitzmann.github.io/siren/
Unofficial implementation: https://github.com/scart97/Siren-fastai2
As the official code is not yet released, I implemented it and trained two of the baselines. The first one is a network trained to do image fitting, so it takes coordinates of a pixel and outputs the RGB values correspondent to that pixel. Here is a sample:
Original image from oxford pets:
Output of a SIREN model:
Notice that to create this image I only give the model individual coordinates that span the range of the full image, and it created the pixels one by one.
The other baseline that I implemented was audio fitting. It’s very similar to the image fitting, but here the input is the time step and output is the amplitude at that point. Top is from model, bottom is original.
Fastai is only used to load the data and training loop, the model itself is pure pytorch and it’s present at the file
Cool! Could you please show the results you obtained with ReLU as well?
Also, do you think this may be useful for classification or other tasks as well? How can the continuous representations created with such SIREN networks be used for downstream tasks?
I will update the first experiments to compare with a model using ReLU as well.
Regarding the applications of this method, as the representation learned is both continuous and estimates correctly the gradients, it has direct use in engineering and science to solve problems (see this talk for some examples).
For the usual downstream tasks I’m not sure how to apply this model. I tried changing a xResnet architecture to use this activation directly, but the results were not better than the ReLU one (more info). Don’t take this result as conclusive, but rather as a indication that the way I tried to approach the problem was wrong.
This is the result using the ReLU activation, everything else the same.
Very nice! Great that you were able to replicate the main results of the paper!
So am I understanding correctly that the model acts as a continuous function that maps from coordinates of the image to the color of that pixel? If so does it work for upsampling images to larger sizes and higher resolutions?
And/or could that then improve a model like unet? (I want to test this myself )
Yes , It works for upsampling. The idea will be very similar to the image inpainting experiment, where you fit a number of points then evaluate the network on the missing points
Test it out and let me know what you discover. If you need any help just message me.
Trained a upsampling example, from 256 x 256 to 512 x 512. Results:
If you zoom in, the upsampled example is smoother, easy to notice looking at the area near the nose.
Very interesting, so this could potentially be used for super-resolution applications?
Probably. The only problem right now is that you need to train a Siren for each image
Really? I thought that’s what the hypernetwork idea is there for?
It seems the Siren network is more sensitive to initial parameter chosen, maybe tweaking the Siren parameters will improve its performance in your xResnet classification tasks?