Neural Style Transfer using S4TF

(James Thompson) #1

Hey everyone,

As a way of learning Swift for TensorFlow, I decided to take a stab at implementing Neural Style Transfer. I wasn’t sure it would be doable since, in keeping with Chris’ analogy, we’re building the airplane as we’re flying it. But nevertheless I was able to get something working to produce pretty decent results such as this:

The notebook is available on GitHub. I’ve also written up a blog post about it on medium.

So this is only a first pass at this. I didn’t spend a ton of time trying to write the slickest Swift code as I wasn’t even sure I’d be able to solve some of the problems I was facing. I’ll detail what some of the hurdles were in case someone else is trying to solve something similar or perhaps has a better workaround.

The first challenge was loading and dealing with a pre-trained network.
I used a pre-trained version of VGG-19 that came from TF-2.0’s tf.keras.applications.VGG19. I was able to write out the parameters to a checkpoint file. I then used the Raw.restoreV2(prefix:tensorNames:shapeAndSlices:dtypes:) to ultimately load the kernels and biases for each convolution. I ignored the fully-connected layers when generating the checkpoint as they weren’t needed–a huge win in terms of file size.

Another thing to point out that is related to loading the checkpoint file was when I was trying to use Just to download my checkpoint .tar.gz file. The server was redirecting to a CDN. The problem with that was that Just was writing the html from the “redirecting” landing page into my tar file :frowning: I spent a good while wondering what I was doing wrong only to find out that was the case. I ended up just using wget via the nifty extension from the swift_dev notebooks. Worked like a charm. It looks as though Just isn’t being maintained (or at least it is very inactive). There are a ton of issues already piled up there, but I’ll submit this as one of them. I’ll probably just fork Just and fix the issue myself as it’s a nice API otherwise. I’ll be sure to put up a PR anyways if I do.

The next challenge was Retrieving the layer activations in a way that played nice with autodiff. This is something I’ll revisit, but my solution was to just create a differentiable struct that stored the layer activations of interest and return that from the model layer’s call(_:) method.

After that, the biggest thing was trying to get the optimizer to update the input image itself. The problem was that, as far as I could tell, allDifferentiableVariables / KeyPathIterable only knows about properties of self, not the input to the call(_:) method. This makes total sense. So my workaround was to have another layer that has the input image as a property and just returns that property from call(_:), ignoring the input. I then sequenced the input of this “ImageTensorLayer” with the VGG19 model which did the trick.

The last thing I’ll mention is how I used Adam (Note: I just started working on an L-BFGS optimizer) to only update the image itself and not the other model parameters. I’m convinced there is a super simple way to do this properly, but I couldn’t find it. It was as simple as copy-pasting the Adam optimizer and just adding a break statement after the first iteration through allDifferentiableParameters. This only worked because the image tensor was the first parameter in the model. I believe I could do some introspection on each variable to selectively “freeze” parameters, but I’m sure it’s not as simple as that.

All in all it was a pretty good way to dig into S4TF’s internals and really start understanding the autodiff system and how things fit together. I’d love to contribute what I can back to the project if it’s something that makes sense. As mentioned above, I’m working on an L-BFGS optimizer which will produce much better results without having to tweak so many hyper parameters.

Looking forward to your guys’ feedback.

-James

P.S. Here’s the tweet for those who are interested.

I just published Neural Style Transfer with Swift for TensorFlow https://t.co/DACoVVCUNK

— James Thompson (@WellFedWookiee) May 2, 2019
21 Likes

(Chris Lattner) #2

This is super awesome James, I love it!

2 Likes

(Matthijs) #3

Nice work! I’ve recently implemented this syle transfer method for a client in Swift using iOS deep learning primitives (so not TF) and it looks like doing it with S4TF is definitely less work and much less code! :smiley:

L-BFGS should be possible too (we use it on iOS). It only uses a lot of memory if you make it keep a large history.

2 Likes

(James Thompson) #4

Hey, Matthijs. It’s very interesting to hear that you were able to run an optimization algorithm such as L-BFGS on the device. My first thought would have probably been that it wouldn’t be possible for memory and speed reasons. I sat down with CoreML about a year and a half ago, but it seems like I ought to dive back into it. I’ve been meaning to do some benchmarking on my iPhone XS. I’ve heard some people mention a 10x performance improvement over the iPhone X. Love your blog by the way :slight_smile:

0 Likes

(Matthijs) #5

This isn’t the kind of thing you can use Core ML for. :wink:

0 Likes