Whenever the BFGS optimizer falls into a local minimum, blur the image and continue.

This one took 994 iterations and ~150 blurs to reach loss=0.53, and it is likely that it is not fully converged yet (I don’t know how long it would take to converge).

For comparison the original BFGS-only method finds a local minimum after 15 iterations, with loss=3.19, and does not improve at all after that.

Why does this work?

I suspect it is because the gradient is always in the direction of making neighboring pixels very different, and going in the direction of the gradient produces random strong edges all over the place (this is the patterned artifacts you see with the original code).

To counter this, we can just keep blurring the edges away, which is the brute force approach.

A more elegant approach would be to add a loss term that penalizes edges, or more generally to add a loss term for pixel-pixel correlation statistics that look too different from those of a real image. Then the local minima would probably disappear.

You should try Total Variation Regularization. The relevant section and paper below:

Section 3.3:

Total Variation Regularization. To encourage spatial smoothness in the
output image ˆy, we follow prior work on feature inversion [6,20] and superresolution
[48,49] and make use of total variation regularizer `T V (ˆy).

very cool experiment! why did you choose to 1000 iterations? how long did that take? It seems pretty extremes especially you’ve already observed local minima at 15 iterations. also, did you try to add edge penalty function? how did that go?

@xinxin.li.seattle Thanks! I didn’t have any idea for number of iterations I just wanted to run it for a long time

I did try to multiply the whole image by a factor such that the total strength of edges was the same after every iteration, instead of blurring. That did not work very well, it converged to a local minimum only slightly better than the original. I think it doesn’t suppress the artifact edges relative to the real edges, it just reduces all edges uniformly. That may not be quite the same as adding edge loss, though.

Hi all, I added total variation loss as well(with weight 8.5e-7) for style transfer. I tried optimizing with bfgs and it still getting stuck on a saddle point.Is that expected or adding tv loss always move away from local minima?
Although it is producing somewhat smoother images without artifacts compared to without tv loss.

without tv loss

with tv loss

the number of iterations were 200
the losses from iteration 80-200 were almost same(1078.19) in case of with tv loss