New Style Transfer Technique Examples

Stream here:

Or better yet download the non-compressed .avi (libx264 codec) from here and watch fullscreen:

I will be providing a lot of detail on my technique shortly… but its pretty technical so likely in a longer form piece.


Vincent this is AMAZING! It’s easily the best style transfer example I’ve seen to date, at least for that style. Does it generalize to other styles or is it best suited for images of that type?

Either way I’m super excited. My artwork is very busy just like the style image and I’ve been fooling around with style transfer but haven’t been satisfied. I can’t wait to give your method a try.

Looking forward to seeing more results, the description and I’m hoping the source as well if you’re willing to share it.

Incredible work man!

Truly great!

1 Like

This is so great. Let’s work together to share this with the highest impact we can. I think this will be huge. My suggestion is to not share too much, other than teasers (like this video), until we’ve got a really great paper/post/whatever done, and then do a big release. In my experience that’s the best way to get great coverage (see DeepMind for an excellent role model on how to do this effectively).

I’ve got some good media relationships who I suspect will want to cover this kind of news. It’s exciting, inspiring, and visual.


Awesome !!!

These are fantastic Vincent!

Vincent: I am curiously waiting for a blog/paper describing the technique you used for this.

Working on it… Still water runs deep.

Sorry for the delay - I finally put up a repo on github with tensorflow code and some explanation:

This is still a work in progress – I’m having difficulty explaining the math/intuition behind the loss function (wasserstein distance in feature space). Hopefully I will be able to improve the explanation with time. Let me know if you have any suggestions.

This is 404ing:

this should be fixed.

I think this should be included in the next cycle of Part 2. It is an improvement over the most advanced technique for this covered in the course (Gram matrix) - he uses the Wasserstein metric instead of the Gram matrix, and it is significantly better.

This is so groundbreaking - why isn’t this so popular?

Made some animations of technique:

This is super cool.

I briefly tried to implement this in pytorch, but found that torch.eig() doesn’t have a derivative implemented. Then I tried to implement the closed form calculation in the paper referenced:

Unfortunately M1 tends to have negative values so taking the square root of it leads to complex values. I assume this is why an eigenvalue approach was used in the first place. Clamping the values of M1 to a minimum of 0 results is the final term in the expression becoming overwhelmingly negative relative to the other terms, so the entire expression evaluates to the square root of a negative number. I suppose it makes sense that clamping the matrix values causes issues as it fundamentally changes the math.

Also on this subject - I’m assuming the square root operation is element-wise. Am I correct in thinking this? Or does it refer to decomposing M1 into a matrix X such that X @ X = M1?

Can anyone think of a way forward for doing this in pytorch other than waiting for torch.eig() to get a backward function?

Hey Karl - The square root operation is not element wise, it is a matrix square root. This means, as you note, finding X such that XX = M1. The eigendecomposition is a convenient way to do this as, for positive semi-definite M1=VLV^T, sqrt(M1)= V sqrt(L) V^T.

Thanks for the reply. I came back to this with pytorch 1.0 having a functioning symeig backward and the technique works really well. Its been interesting to play around with different conv layers and other parameters.

Thank you Vincent for sharing! I have always wondered why we used the Gram matrix, I had some intuition that it has something to do with second order terms but could not come up with a better loss function. This is the insight I was looking for!

For the curious here is my Pytorch implementation (ported from Tensorflow):

def torch_moments(x):
    if len(x.shape) == 3:
        c, w, h  = x.shape
    elif len(x.shape) == 4:
        x = x[0, :, :, :]
        c, w, h = x.shape

    elif len(x.shape) == 2:
        w, h = x.shape
        c = 1
    n = w * h

    x = x.permute(1, 2, 0)
    flat = torch.reshape(x, (n, c))
    mu = torch.mean(flat, dim=0, keepdim=True)

    cov = torch.matmul(torch.transpose(flat - mu, 0, 1), flat - mu) / n

    return mu, cov

def wdist(m1, m2):
    mean_stl, cov_stl = torch_moments(m1)
    eigvals, eigvects = torch.symeig(cov_stl, eigenvectors=True)
    eigroot_mat = torch.diag(torch.sqrt(torch.max(eigvals, torch.tensor([0.]))))
    torch.matmul(eigvects, eigroot_mat)
    root_cov_stl = torch.matmul(torch.matmul(eigvects, eigroot_mat), torch.transpose(eigvects, 1, 0))
    tr_cov_stl = torch.sum(torch.max(eigvals, torch.tensor([0.])))

    mean_synth, cov_synth = torch_moments(m2)
    tr_cov_synth = torch.sum(torch.max(torch.symeig(cov_synth, eigenvectors=True)[0], torch.tensor([0.])))
    mean_diff_squared = torch.sum(torch.square(mean_stl-mean_synth))

    cov_prod = torch.matmul(torch.matmul(root_cov_stl, cov_synth), root_cov_stl)
    var_overlap = torch.sum(torch.sqrt(torch.max(torch.symeig(cov_prod, eigenvectors=True)[0], torch.tensor([0.1]))))
    dist = mean_diff_squared + tr_cov_stl + tr_cov_synth - 2 * var_overlap
    return dist

If you ever run into this issue with my code above ( symeig instability ):
RuntimeError: symeig_cuda: the algorithm failed to converge; 289 off-diagonal elements of an intermediate tridiagonal form did not converge to zero.

You can try pertubating the matrix before calling symeig, I have had good success with this method:
def perturbate(x, scale=1.0):
vals = torch.abs(x).to(device)
mn = torch.min(vals[vals.nonzero(as_tuple=True)]).to(device)
lambdas = scale * (torch.rand(x.shape[-1]) - torch.tensor(0.5)) * 2.0 * mn
return x + torch.diag(lambdas).to(device)