Curating Lesson 8 experiments

I learned so much from the discussion/experiments on the forum this past week, and I decided to summarize what I’ve learned. Maybe this will help other people as well.

I also think since part2 involve so many experiments, in a way, we are creating new knowledge together. Perhaps it will be nice to organize our collective experiments as we move along the course.

Finding a better optimizer

Unlike the models we built in part 1, here we don’t need stochastic gradient descent optimizer because we don’t have a stochastic problem to solve. Instead, we use deterministic optimizer, specifically bfgs (Gatys 2016 claims it work the best for image sythesis).
@alex_izvorski made one key observation: the L-BFGS-B optimization algorithm used always gets stuck in local minima, and the local minima seems to always have a characteristic pattern of artifacts with a high spatial frequency in the generated image. He developed a better optimizer to reducing the artifacts patterns in image generation.

@jeremy and @kelvin suggested total variation regularization as an alternative approach. In this approach, one would modify the loss function with total variation loss to incorporate smoothness. Here is the keras implementation . Did anyone test this idea?

Finding an even better optimizer for style transfer

@slavivanov experimented speed of various optimizers for style transfer. The results show that Adams is comparable to L-BFGS both in terms of speed and error. This finding challenges "traditional thinking “for deterministic functions, line search / hessian approaches should destroy SGD approaches!”. More over, larger pixel images seems to favor Adams in convergence speed. (Could this be related with the L-BFSG getting stuck in the local minima? I wonder) @jeremy suggests optimizing learning rate strategy in SGD for better results. Super excited about this research, looking forward to more results.

Tuning Optimizer parameters (starting point, loss, gradient)

starting point:

 1. Start with white noise
 2. Start with original content image
 3. Start with original style image
 4. Start with the last iteration (so that you don’t have to always start from the beginning)

loss function

  1. Content loss
    1.1single layer content loss (experiment on output layer, i.e. block4conv2 vs block3conv2 vs block3conv1, etc)
    1.2. multiple layer content loss (any experiments on the forum?)

  2. Style loss
    @davecg did cool experiments on single layer versus multiple layer style loss)

  3. Total variation loss (this is completely optional, and suggested by Jeremy and Kelvin for smoothing the generated image. Alternatively, think of our current model with a zero scaling factor on total variation loss)

  4. Experiment on scaling parameters .
    I didn’t systematically collect these experiments, but there are some discussions here

preserving content colors in style transfer

I’m not sure if this has anything to do with optimizer or loss function, but it’s pretty cool. Some experiments on luminance only style transfer

style treatment for interesting effects

In additional to the optimizer experimentations, @bckenstler experimented with style treatment, specifically tiling style images to give stronger texture. By repeating a weaker style images, he created a much stronger style image with a more distinguished texture. "In short, I think tiling images like these help you transfer things like texture or design by “spreading out” said design over an image. I.e., two red dots in the image may not be enough to transfer that style, but if you tile and have 20 red dots then the style signal for that comes out much stronger". The big takeaway for me are the cool results and the intuitive thinking in experimental design.

Please correct any mistake in this review. I also probably didn’t include every single experiment on the forum, please feel free to reply to add your experiment.

12 Likes I did a bit of experimentation on the speed of various optimizers for style tranfer: