Why sum the loss from multiple layers and sum it up for style transfer?

I am trying to work my way through the notebook and wondering why do we have to sum up the loss from various layers for style transfer, when we work on only a single layer for content?

What is summing the loss from various layers giving us, that loss from one layer can’t?

We do this so that we don’t just get the feature maps from one particular layer of the network run on the style input image. This way we average them out and are more likely to get just the correlation between the pixels and therefore more texture/style and less content of the style image.

From the actual paper:

“By including the feature correlations of multiple layers, we obtain a stationary, multi-scale representation of the input image, which captures its texture information but not the global arrangement.”

3 Likes

@samwit - [quote=“samwit, post:2, topic:1774”]
This way we average them out
[/quote]

I thought we’re summing the loss. Is there averaging happening also?

You can see the effect of each layer in the style directly here (left is a single layer, right is every layer except that layer with appropriate scales):

2 Likes

What I mean is that we don’t give prominence to one particular layer and its feature maps/filters, thus they average/balance each other out. The main idea is that doing this and the use of the gram_matrix can remove the content elements of the image and leave behind the fingerprint of the style.

1 Like

When you optimize, averaging and summing are basically the same thing - one is a constant times the other.

1 Like

@davecg cool experiments! really like how you benchmark the impact of each layer. just to make sure i follow correctly, are these experiments on content loss variations ( single layer and everything but that specific single layer)?

@bckenstler did some experiments with block4_conv2, block3_conv2, block3conv4 and had some interesting results. I’d be curious to see your experiments with other convolution layers for different blocks.

Those are actually style loss variations using either {block_n_conv1} with weight of 1 or {block_i_conv1 for i != n} with weight of 1/(# layers -1).

I can run the same thing for content and it will obscure parts of the bridge.

ah, got it. thanks for clearing that up. which content output layer are you using?

btw, in your other test that preserves the content color, did you make any changes to the loss function setup at all?

block4 seems to work best.

same loss.

Experiment changing content layers: