StackGANs Video Project?

brendan · April 11, 2017, 6:32am

Video dataset with scene descriptions
https://mila.umontreal.ca/en/publications/public-datasets/m-vad/

Interesting article that divides image generation into separate tasks
Generative Image Modeling using Style and Structure Adversarial Networks
https://arxiv.org/abs/1603.05631

I’m also wondering if generating photorealistic images from random noise is asking a bit much from a neural network. Pixel tweaking makes for nice gradients, but can we accelerate the processing by giving the network a paintbrush? A set of higher level tools to play with - pre-made shapes, semantic segmentations, 3d objects, unity game engine …

amduser · April 12, 2017, 5:26am

Interesting new work on photorealistic face generation with GANs.

xinxin.li.seattle · April 12, 2017, 6:25am

wow! Incredible find @amduser! The blog is a good read, a nice interpretation of the original paper.
The paper is very well written on the first glance, complemented with failed attempts (which shows they are really confident about their work. )

[a tensorflow implemetation] (GitHub - xxlatgh/BEGAN-tensorflow: Tensorflow implementation of "BEGAN: Boundary Equilibrium Generative Adversarial Networks")

brendan · April 13, 2017, 8:48pm

So how about we create a short film and submit it to this competition?
https://att.submittable.com/submit/82003/emerging-indie-filmmakers

Film submission is due May 26. Finals at Warner Bros. Studios in Los Angeles, July 14–15, 2017. If we go in with a good tech stack maybe Warner Bros will buy us

Here are some other “experimental” film festivals:
http://expcinema.org/site/en/calls-entries

layla.tadjpour · April 13, 2017, 9:42pm

@jeremy Can you elaborate on why you think Google Cloud ML would be a short cut as opposed to our own DL server? I looked into it and found Google Cloud SDK kinda hard to work with. Although, It seems to be delivering a more economical service than AWS.

brendan · April 13, 2017, 9:54pm

A related question, won’t super resolution be super easy for Google/Facebook/Photoshop to integrate into their existing products? Given their existing user base + training data, how could we possibly compete?

xinxin.li.seattle · April 13, 2017, 9:58pm

Great idea! Do you have any particular theme that you are passionate about?

My immediate thought is a revamp on the old silent film. It’s convenient because it’s visual only, you don’t have to worry about creating dialogue/narrative.

If you’ve seen the movie Hugo, they made reference to Georges Méliès’s films, which has roots in magic tricks. Personally I feel it could blend well with deep learning.

It just so I came across this thing called deepwarp, which allow you to create image of anyone rolling their eyes. It’s quite fun, and I can image spinning that on this following film.

We can add style transfer, playing around with time and space, really bring this old film to life… of course there is super resolution, and all kinds of other fun stuff.

Just some quick thoughts and hope that stimulates the conversation. I’m absolutely open to all kinds of ideas.

brendan · April 13, 2017, 10:39pm

I think it’s a brilliant idea and definitely feasible. I’m keen to explore it and the idea of enhancing/augmenting existing content just like we did with the cat.

I’m also exploring using video game animations as input and generating photorealistic or painting stylized output via CycleGAN. So take Grand Theft Auto (which OpenAI Universe has an API for) and apply the “monet2image” technique outlined in the CycleGAN paper. See their horse to zebra example. Pretty smooth video-based transfer.

Our “Paintbrush” can be pre-designed 3D Unity objects and animation sequences (there are tons in the Unity assets store which our algorithm can be trained to generate. Instead of generating raw pixels, we train the model to map text descriptions to 3d characters and to generate commands like “left”, “right”, “up”, “jump” etc.

We can pass these semi-realistic 3D generations into a conditional GAN like CycleGAN, to bring them to life.

Definitely far out, but I think with shortcuts + backdoors we can get something cool working that’s mostly AI generated. I’m going to play around with Cycle GAN painting --> image tonight and see how it works.

jeremy · April 13, 2017, 11:32pm

Sure - it’s a great question.

Putting a model in production is not at all straightforward. You have to think about:

Not running out of GPU memory
Batching up multiple requests, if they’re coming in fast, to take advantage of the GPU
Ensuring that no one user or request saturates your resources
Queueing up requests when all GPUs are free
Sharing batches across GPUs so that they’re all being used effectively
…and so forth

Take a look at the Tensorflow Dev Summit video on Tensorflow Serving to get a sense of the kinds of problems it solves. Both Cloud ML and Tensorflow Serving are likely to help with this stuff, although the latter is still rather incomplete and buggy, I believe.

jeremy · April 13, 2017, 11:40pm

In my experience that’s never something you want to be asking yourself! Yes, it’s true, absolutely anything you ever think of or do could be done be someone else. But, will they do it with as much care, attention to detail, tenacity, user empathy, pragmatism, and efficiency as you? No, of course they won’t!!! If you believe otherwise, you will never be able to release anything, or if you do, and someone else comes along to compete with you (which always will happen if you’re successful), you’ll give up.

If anything, the opposite is true - building something extremely complex and beyond the current research cutting edge is something that big companies are more likely to be good at. Or if you do it, and they see it can be done, they’ll be more likely to be better at catching up and passing you.

Smaller companies are great when they pick something they care about, and is within their capabilities to do a great job of. Dropbox is a great example - pretty much every big company had already tried to do file sync (Microsoft had tried at least 3 times!) but Dropbox did it with more care and pragmatism.

When I started FastMail, pretty much everyone told me that it was pointless to compete with Yahoo and Hotmail. But I went ahead anyway because I felt like what I wanted to create was something that no-one else at that time had yet created, and I wanted it to exist (that is, synchronized email across all your computers).

So, once you become the first person to build a great (for example) super-res product, you are now the leader, and can move on to adding lots of other great features for cleaning up old photos and scans. Everyone else is now playing catchup. And of course they’ve all got their own priorities, which are keeping them busy!

Anyways, as I’m sure you can see, this is something I feel very very strongly about. We have to believe in ourselves, and we have to build things that we want to see exist, and can’t be assuming that someone else is going to do it better than us!

brendan · April 13, 2017, 11:42pm

Some more ideas. In the world of video games/animations/3d renderings, where are the datasets that map animation --> photo? Two things that immediately come to mind:

Call of Duty game --> Band of Brothers Movie. There is a sequence in the video game that copies the movie sequences almost exactly.
Minecraft --> Flickr. We look for worlds built on real places like these and then train a model to generate minecraft worlds. Minecraft already does non-deep learning level generation so this should be straightforward.
. There are python APIs for creating the worlds. Might be an RNN problem.

Unrelated, but on the topic of Generative models… anyone interested in generating 3D printed objects? We train a GAN on open-source 3D printer designs, say a specific category like “cups” and see what it comes up with. We can explore the reverse too, images of 3d printed objects --> designs.

brendan · April 13, 2017, 11:48pm

I wish there was a way to triple-like this post.

It’s really good to hear this and I know others will feel the same way. Sounds like a chapter for your next book

jeremy · April 14, 2017, 2:16am

Thanks for being so open and encouraging @brendan

Matthew · April 15, 2017, 5:20am

iNLyze · April 16, 2017, 9:36pm

Yes, @brendan, super-great. I think there is tons of potential in DL augmented design. Definitely, on my impossibly long, ever-growing list of things I want to try and do. If only I could code faster
Perhaps related to this: If you took a model which extracts 3D information from a photo and combine it with one trained to create 3D shapes you almost have a replicator (even without 3D scanner). Perhaps even better than a 3D scanner. The latter only scans the surface, but with a trained model you can impute a 3D interior (given some boundary conditions determined by a classifier which knows what class of object you are looking at).

brendan · April 17, 2017, 12:03am

Interesting case of super-resolution. This company was bought by Twitter. They downsampling video game streaming quality to make it faster and then apply super-resolution on the client side to upsample. Neat!

https://www.technologyreview.com/s/601258/artificial-intelligence-can-now-design-realistic-video-and-game-imagery/

Surya501 · April 17, 2017, 5:33pm

Not exactly DL augmented design, but I tried running BEGAN on zappos shoe dataset and I got it to interpolate between a few shoes to generate new “designs”. The output is not perfect, but you largely get an idea of what you get when you mix two shoes.

The code for this came from https://github.com/carpedm20/BEGAN-tensorflow . I just edited a few things to adopt to Python 3 and tensorflow version on my system. It took 50 hours to run on a single 1080.

What do you think of the output?
The interpolation below shows one kind of shoe morphing into another with (reasonably?) believable intermediate shoe designs.

These shoes are the ones that are generated by the trained network.

iNLyze · April 17, 2017, 10:24pm

Very interesting idea! The images are a bit smallish, but some are quite convincing. I think a future improvement might be to alight the shows. As they are photographed from different angles, the intermediates sometimes have a not quite believable perspective (e.g. third row from the top).
This “alignment” needs to be some sort of image registration. Ideally, you’d even do this in 3D (involving to extract a 3D model of the shoe from the photo, rotating it to match the other shoe, rendering both in the same perspective and then doing the morphing).
Still, pretty exciting.
As for the 50 hours training - could you share a history of the loss function and/or accuracy? That might be helpful to understand how it trains. I am asking, because, sometimes I don’t know if I am just giving up an architecture too early or if it is actually buggy.

Surya501 · April 17, 2017, 11:43pm

I wrote about the image quality in the issues in the github repo and got a response from one of the paper authors about trying to tweak my learning rate, h & Z. I think that is very nice that the community at large is very very helpful.

Also, my images did not drastically improve beyond 200k iterations. the lack of diversity in the output (second half of the set of images was due to mode collapse). I don’t think I understand what that means, but I hope one if you can explain it to me :-).

The original paper had interpolation between faces in different angles and it should be able to learn provided there is diversity in the input sample. However, zappos dataset has no diversity when it comes to shoe angle. I am downloading some car pictures to see if I can train this on car images. (and get more images of subaru baja or chevy el camino by interpolating between car and truck )

I’d like to do this in 3D space, but I can’t wrap my head around how to represent the data … baby steps, I guess.

could you please elaborate on this? I don’t understand.

My loss function over time is below.

I can also share my trained network parameters if you care.

Surya501 · April 17, 2017, 11:52pm

@brendan, @xinxin.li.seattle
After playing with Began paper on zappos and seeing their results on face interpolation, It would be cool if we can take a movie, replace the primary character with your face and also transfer the actors emotions to your face. Now imagine doing that on charlie chaplin’s movies.

This blog post talks about how to get the facial landmarks and how to morph the faces from one angle to another.

I have no clue if we can make the may timeline, but we can play with the various bits and pieces and see what comes out of it.