Siren: sinusoidal representation networks

scart97 · June 21, 2020, 12:48am

Interesting new paper that uses periodic activation functions (sine) to learn representations of complex signals that are both capable of representing fine detail and preserve spatial and temporal derivatives.

Video: https://www.youtube.com/watch?v=Q2fLWGBeaiI
Project page: https://vsitzmann.github.io/siren/
Paper: https://arxiv.org/abs/2006.09661

Unofficial implementation: https://github.com/scart97/Siren-fastai2

As the official code is not yet released, I implemented it and trained two of the baselines. The first one is a network trained to do image fitting, so it takes coordinates of a pixel and outputs the RGB values correspondent to that pixel. Here is a sample:

Original image from oxford pets:

Output of a SIREN model:
siren_fit

Notice that to create this image I only give the model individual coordinates that span the range of the full image, and it created the pixels one by one.

The other baseline that I implemented was audio fitting. It’s very similar to the image fitting, but here the input is the time step and output is the amplitude at that point. Top is from model, bottom is original.

Fastai is only used to load the data and training loop, the model itself is pure pytorch and it’s present at the file siren.py

ilovescience · June 21, 2020, 4:04am

Cool! Could you please show the results you obtained with ReLU as well?

Also, do you think this may be useful for classification or other tasks as well? How can the continuous representations created with such SIREN networks be used for downstream tasks?

scart97 · June 21, 2020, 6:10am

I will update the first experiments to compare with a model using ReLU as well.

Regarding the applications of this method, as the representation learned is both continuous and estimates correctly the gradients, it has direct use in engineering and science to solve problems (see this talk for some examples).

For the usual downstream tasks I’m not sure how to apply this model. I tried changing a xResnet architecture to use this activation directly, but the results were not better than the ReLU one (more info). Don’t take this result as conclusive, but rather as a indication that the way I tried to approach the problem was wrong.

scart97 · June 21, 2020, 9:27am

This is the result using the ReLU activation, everything else the same.

ilovescience · June 21, 2020, 5:48pm

Very nice! Great that you were able to replicate the main results of the paper!

MadeUpMasters · June 22, 2020, 11:38am

So am I understanding correctly that the model acts as a continuous function that maps from coordinates of the image to the color of that pixel? If so does it work for upsampling images to larger sizes and higher resolutions?

muellerzr · June 22, 2020, 12:54pm

And/or could that then improve a model like unet? (I want to test this myself )

scart97 · June 22, 2020, 8:25pm

Yes , It works for upsampling. The idea will be very similar to the image inpainting experiment, where you fit a number of points then evaluate the network on the missing points

scart97 · June 22, 2020, 8:51pm

Test it out and let me know what you discover. If you need any help just message me.

scart97 · June 22, 2020, 9:21pm

Trained a upsampling example, from 256 x 256 to 512 x 512. Results:

If you zoom in, the upsampled example is smoother, easy to notice looking at the area near the nose.

ilovescience · June 22, 2020, 11:19pm

Very interesting, so this could potentially be used for super-resolution applications?

scart97 · June 22, 2020, 11:32pm

Probably. The only problem right now is that you need to train a Siren for each image

ilovescience · June 23, 2020, 12:16am

Really? I thought that’s what the hypernetwork idea is there for?

madcat · June 27, 2020, 5:30am

It seems the Siren network is more sensitive to initial parameter chosen, maybe tweaking the Siren parameters will improve its performance in your xResnet classification tasks?