Hi - I’ve just finished lesson 8 (loving the course!), but I’m actually confused at why the style function isn’t that explainable.
I may easily be missing something, but doesn’t the diagonal of the gram matrix contain a lot of specific information about what styles are and aren’t in the target image? Say the first filter is one that picks up on wavy lines. If there are lots of wavy lines throughout the target image, the dot product of it times itself (top left of the gram matrix) will then be very high. (And consequently, the noisy generated image will move to have more waves). Similarly, if the next filter picks up on orange, but our target style isn’t orange at all, the 2nd element on the diagonal will be near zero, and the generated image will learn not to use orange. Since we’re looking at the output of a ReLU activation, we’ll get the highest values if that feature is common throughout the target image, and low values if it is usually absent. (We’ll never have two large negative numbers multiplying to create a high positive addition to our sum).
The off diagonal values are trickier to explain, but not crazy – we might find say that wavy lines are usually blue rather than white, or some other combination like that. (And it seems plausible that these values will be a fair amount lower than the matrix diagonal, so the diagonal might be the most important factor in driving the generated style).