Fast Style Transfer in fastai v1

FraPochetti · November 28, 2019, 4:36pm

Still the content2style ratio is super important.
Even in the regular style transfer (not fast), the weight you give to one loss vs the other has a huge impact. At least in my experiments.

lgvaz · November 28, 2019, 4:41pm

Yes, getting this number out of the sweet spot completely destroys the result. It’s not a smooth factor, and that’s what I don’t like about it. If you are having good results and increase this number just a bit for “a little bit more style” it can completely go all in on style and destroy everything

FraPochetti · November 28, 2019, 4:42pm

Totally. But I don’t know how to fix that.

FraPochetti · November 28, 2019, 4:44pm

On top of it, it is not even something you can really mathematically optimize for (probably yes, I have to think about it).
It is very related to personal taste around the end result (up to a point of course! a colored mess does not please anybody)

lgvaz · November 28, 2019, 4:49pm

Hm… Thinking about that, is it possible to have bounds on the content loss?

Can we somehow say “I don’t accept the content loss to be greater than X”. (How to find X? Maybe empirically?)

We can also some kind “elastic” aspect to the content loss. Meaning that the further we deviated the more weight we give to it.

We can maybe first train the network only with content loss and then incrementally add style loss…

FraPochetti · November 28, 2019, 4:57pm

Good questions. It could be an idea.
All in all the system looked pretty unstable to me. So maybe applying guardrails during training to push things back on track could be reasonable.
Another interesting proof of this inherent instability of the model is the fact that I have never trained it for even one full epoch (on full COCO). I actually trained it on a super small subset, e.g. For 12 mins more or less. Both losses plateaued super quick and training for more would destroy the content 2 style balance. Quite insane.

FraPochetti · November 28, 2019, 5:18pm

I am planning on working on deep painterly harmonization soon https://sgugger.github.io/deep-painterly-harmonization.html#deep-painterly-harmonization so I will get back to this one!

lgvaz · November 28, 2019, 5:18pm

Ow yes, that was what got me started!! I do want to work on that as well!!

lgvaz · December 4, 2019, 3:59am

So after experiment a lot I’ve made some discoveries.

The single most important factor was batch norm in vgg, using batch norm simply destroyed the results. I don’t understand why yet, but maybe batch norm obfuscates the importance of each feature?

Transformer net still the best (I do have to do more experiments with resnets though), but it seems that resnets tend to be more repetitive.

Adam optimizer with fit_one_cycle works better than Ranger (which is confusing?)

I followed the tips from here and used shallower vgg layers, it helps creating bigger features.

I still have to experiment with combining multiple layers, it’s very tricky though… While original paper users relu3 for content loss pytorch examples uses relu2

I’ll post the code here tomorrow after I clean it. Maybe I’ll also write a blog post about it to describe this in more detail, is that a good idea?

Meanwhile, here’s a spoiler of the new implementation:

FraPochetti · December 4, 2019, 5:05am

Congrats man! Your new results look really good! The batch norm thing is a great finding and you should totally write a blog post about all of this.

FraPochetti · December 4, 2019, 5:16am

I will try to apply your findings on my code base and see if I get anything similar!

digitalspecialists · December 4, 2019, 6:50am

Adam optimizer with fit_one_cycle works better than Ranger (which is confusing?)

In the Mish thread it is asserted

I’m pretty much of the opinion now after a lot of testing that OneCycle destroys most of the momentum that smarter optimizers and Mish build up. I continue to see much better results avoiding it, unless it’s with Vanilla adam and Relu. Then it seems to work well.

I’ve noticed the same thing. I stick with Adam/AdamW with 1cyxle.

lgvaz · December 4, 2019, 3:22pm

Here is the link

Changes and PRs are much appreciated!

lgvaz · December 4, 2019, 6:46pm

I tried training with RAdam once more. I had to use smaller learning rates compared to one_cycle, but the results were actually a little better…

Clearly I still have a lot of room to experiment here

FraPochetti · December 30, 2019, 2:58pm

Hey @lgvaz, out of curiosity, how big is your training set?
21837 images as path.ls() returns in your nb?

lgvaz · December 30, 2019, 3:18pm

Correct. That’s only a sample of COCO.

I’m currently looking for a dataset with higher resolution images (like 1024x1024), any advises?

FraPochetti · December 30, 2019, 3:23pm

Thanks!
No idea for the high resolution dataset, unfortunately. Sorry.

muellerzr · December 30, 2019, 3:23pm

@lgvaz here’s one that are 2K https://data.vision.ee.ethz.ch/cvl/DIV2K/

muellerzr · December 31, 2019, 10:11pm

Currently testing it out now, I have a bs of 1 and an image size of 1024 which uses ~9.28gb on colab, which means we could probably scale it up higher from there. Epoch time isn’t very different either (~9 minutes per epoch). Also note though we only have 1,000 images, and I just did the training so 800!

results

olivia_rose · January 15, 2020, 5:03pm

This is awesome stuff!!

Do you guys have any idea how the Copista app may be handling HD images:

The app shows progress bar while working, so it might have split the TensorFlow model into parts (there is an earlier post by the app author which says it uses TensorFlow Mobile).