Semantic Style Transfer

brendan · March 25, 2017, 8:51pm

Hi all,

Does anyone have experience with masking techniques for neural style transfer? Matthew and I are at a hackathon working on a project to combine style transfer with semantic image segmentation. There are some good papers on the topic (like this one) but we were wondering the best approach.

Preliminary Results
https://github.com/bfortuner/deephacks/blob/master/styles/cat.jpg_stylized.png_van_gogh.png?raw=true

Brendan

Matthew · March 25, 2017, 9:06pm

Here’s an example of the MRF blending technique:

We’re wondering if there are other techniques to help make inserted elements look more natural within the original image.

Even · March 26, 2017, 4:08am

The most promising segmentation paper I’ve seen so far is:
https://arxiv.org/abs/1703.06870v1 (Mask R-CNN)

which builds upon:
https://arxiv.org/abs/1504.08083v2 (Fast R-CNN)

Their most promising results come from using an interesting residual architecture that I haven’t seen before that they call ResNext:

A bit of a rabbit hole, but I really enjoyed diving down it. @jeremy I’m curious if you have encountered ResNext blocks before? It seems like an interesting architectural change. I’m just reading the paper now to see if I can glean the foundations behind it.

melissa.fabros · March 26, 2017, 5:36pm

@brendan @Matthew Good luck with the hackathon!!!

brendan · March 26, 2017, 7:02pm

Some more results

Video Transfer

Semantic Cropping

“Crop Cat, please!”

Explicit Content

Eg. nudity. This could be a browser extension for kids to automatically blur explicit content. Same technique could be applied to trademarks, logos, celebrity faces, etc.

Background Transfer

We identify the person, create a mask, then reverse the mask to stylize the non-person pixels.

Fire Cat

Multiple Objects

“Stylize the Dog”

“Person”

jeremy · March 26, 2017, 11:28pm

Yup - one of those many things I’d love to get to if we have time!

Matthew · March 27, 2017, 12:42am

Thanks, Melissa! We won the image category. The reward was a Titan X Pascal GPU.

@jeff also won a category.

All three classmates who competed won. That says a lot about this course’s value.

jeff · March 27, 2017, 12:44am

Thanks @Matthew. Congrats to you & @brendan for winning best image project! I recorded a video of your demo: https://twitter.com/jeff_x_l/status/846114382987767808

brendan · March 27, 2017, 12:59am

Here are the presentation slides with our final examples

There’s another AI hackathon at Google this week if anyone is interested in teaming up with us!

jeff · March 27, 2017, 2:20am

Do you have idea(s) in mind to pitch at the event, @brendan?

brendan · March 27, 2017, 2:59am

I’m open to ideas. One option is to improve what we completed this weekend: better and faster style transfer, better segmentation techniques (ResNext, Markov random fields), incorporate ideas from recent papers, add WGAN on top, etc.

Another idea is to take the semantic style transfer idea and combine it with speech recognition to build a “hands free photo editing tool” where users can upload a photo and then issue voice commands like “crop cat, blur cat, copy and paste cat, apply fire style to mountains, etc”

I’m also curious to see how realistic of a movie we can create with this. Could we film a person walking and use style transfer to make them look like a character from Avatar?

The organizers of this upcoming event requested slides/demo of our idea. It’s not required, but it’s fair game to work on an existing idea.

jeff · March 27, 2017, 6:08pm

Sounds like the start of a great product for movie studios I imagine that the same techniques can be applied to help doctors give voice commands to focus on potential trouble spots in medical images.

thunderingtyphoons · March 27, 2017, 6:39pm

@brendan - Did you end up using Mask R-CNN?

Also, here is a recent paper that came out last week on photo realistic style transfer. http://arxiv.org/abs/1703.07511

jeremy · March 27, 2017, 6:58pm

This is great work! Seems that the hackathon environment is well suited to getting projects done end to end.

Which hackathon was it?

brendan · March 27, 2017, 7:03pm

http://www.deeplearninghackathon.com/

At the event they also open sourced a framework called Kur, which is like Keras with .yaml files.

brendan · March 27, 2017, 7:09pm

We were just talking about how amazing that paper is! Interested in helping us implement it in Keras?

For segmentation we didn’t have time to implement Mask R-CNN, but we found a helpful github with a DilatedNet implementation that seemed to work ok.

If anyone is interested in helping out, we are going to try to implement a few of these new papers! It could be a cool class project

Here are a few techniques we could try implementing:

Video Style Transfer with Optical Flow

Deep Photo Style Transfer

Fast Patch-based Style Transfer of Arbitrary Style

Targeted Style Transfer Using Instance-aware Semantic Segmentation

*Uses Markov Random Fields

Mask R-CNN

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

*Yoshua Bengio one of the authors

jeremy · March 27, 2017, 7:12pm

I’d be interested in getting together to write an implementation later this week, if folks are around. Should be fun!

brendan · March 27, 2017, 7:48pm

Yay! @Matthew, @sravya8 and I were planning to meet Friday if that works for you?

thunderingtyphoons · March 27, 2017, 8:26pm

@brendan Unfortunately, I am in the east coast. Waiting for the day when Neural Networks can do teleportation

davecg · March 27, 2017, 8:28pm

Indeed. East coast blues for me too.