As a way of learning Swift for TensorFlow, I decided to take a stab at implementing Neural Style Transfer. I wasn’t sure it would be doable since, in keeping with Chris’ analogy, we’re building the airplane as we’re flying it. But nevertheless I was able to get something working to produce pretty decent results such as this:
So this is only a first pass at this. I didn’t spend a ton of time trying to write the slickest Swift code as I wasn’t even sure I’d be able to solve some of the problems I was facing. I’ll detail what some of the hurdles were in case someone else is trying to solve something similar or perhaps has a better workaround.
The first challenge was loading and dealing with a pre-trained network.
I used a pre-trained version of VGG-19 that came from TF-2.0’s
tf.keras.applications.VGG19. I was able to write out the parameters to a checkpoint file. I then used the
Raw.restoreV2(prefix:tensorNames:shapeAndSlices:dtypes:) to ultimately load the kernels and biases for each convolution. I ignored the fully-connected layers when generating the checkpoint as they weren’t needed–a huge win in terms of file size.
Another thing to point out that is related to loading the checkpoint file was when I was trying to use Just to download my checkpoint
.tar.gz file. The server was redirecting to a CDN. The problem with that was that Just was writing the html from the “redirecting” landing page into my tar file I spent a good while wondering what I was doing wrong only to find out that was the case. I ended up just using wget via the nifty extension from the swift_dev notebooks. Worked like a charm. It looks as though Just isn’t being maintained (or at least it is very inactive). There are a ton of issues already piled up there, but I’ll submit this as one of them. I’ll probably just fork Just and fix the issue myself as it’s a nice API otherwise. I’ll be sure to put up a PR anyways if I do.
The next challenge was Retrieving the layer activations in a way that played nice with autodiff. This is something I’ll revisit, but my solution was to just create a differentiable struct that stored the layer activations of interest and return that from the model layer’s
After that, the biggest thing was trying to get the optimizer to update the input image itself. The problem was that, as far as I could tell,
KeyPathIterable only knows about properties of
self, not the input to the
call(_:) method. This makes total sense. So my workaround was to have another layer that has the input image as a property and just returns that property from
call(_:), ignoring the input. I then sequenced the input of this “
ImageTensorLayer” with the VGG19 model which did the trick.
The last thing I’ll mention is how I used Adam (Note: I just started working on an L-BFGS optimizer) to only update the image itself and not the other model parameters. I’m convinced there is a super simple way to do this properly, but I couldn’t find it. It was as simple as copy-pasting the Adam optimizer and just adding a
break statement after the first iteration through
allDifferentiableParameters. This only worked because the image tensor was the first parameter in the model. I believe I could do some introspection on each variable to selectively “freeze” parameters, but I’m sure it’s not as simple as that.
All in all it was a pretty good way to dig into S4TF’s internals and really start understanding the autodiff system and how things fit together. I’d love to contribute what I can back to the project if it’s something that makes sense. As mentioned above, I’m working on an L-BFGS optimizer which will produce much better results without having to tweak so many hyper parameters.
Looking forward to your guys’ feedback.
P.S. Here’s the tweet for those who are interested.
I just published Neural Style Transfer with Swift for TensorFlow https://t.co/DACoVVCUNK— James Thompson (@WellFedWookiee) May 2, 2019