Is the trained vgg model necessary for style transfer?

I just finished this super interesting paper that implies (at least to my reading) that the structure of the network is actually much more important than the training, which has really interesting implications for transfer learning.

In it they take the VGG19 network, randomly generate weights and are able to reproduce original images and do style transfer on a similar level to what we’ve been doing.

Intuitively it makes some sense after reading it. The random mappings of the neural net create an incredibly rich complex multidimensional space and when you’re trying to minimize the distance across that space by creating an image that maps to a similar activation you’re going to come up with similar images.

Still it does run counter to a lot of what we’ve been talking about and I’d be curious to hear others takes on it.

1 Like

cool article, thanks for sharing. if you are not using a pretrained vgg model, do you still need to use the imagenet mean in the preprocessing step? if not, how do you implement image preprocessing?

Good find! They don’t seem to be optimizing their styled images for long enough for the trained vgg baseline - they’re not nearly as good as other examples of style transfer we’ve seen. Which makes their random vgg comparisons look better than they are.

It’s still interesting that it works at all. I’m assuming it’s because the gram matrices by channel capture relationships between filters whilst removing all the geometry.