Yes, it is clear now!
There is a lot I can explore on the layer combo for style part (no. of VGG layers, and no. of blocks to be included per layer, and potentially the weight for each layer) . And directly optimizing from the target content image seems could speed up the training process (not sure if it would hurt the generalization), I would like to try out and see how much faster and how much content improvement it could gain.
Thanks for your feedbacks!