StackGANs Video Project?

Denoising is a good use case. As an example for dance photography, one has to use high shutter speeds, which limits the amount of light that can get into the sensor. Even if we use a large aperture, good images are hard to get. One solution is to raise the ISO, but this introduces ISO noise. If we can remove ISO noise cleanly then even a poor camera can compete with a high end camera with good sensors that supports high ISOs. I am pretty sure there is a good demand for an app like this. Combine this with super resolution and you have a winner. Generating training data for denoising is also relatively easy.

4 Likes

That’s a great example of a use case. Indoor or evening photography with a cell phone camera is another one.

1 Like

Just read Learning What and Where To Draw
The demonstrate conditional GANs that accept the location of where you want the GAN to generate the image. They have a nice GitHub too

Also read Plug and Play

The author lists some “cool applications”:

  • One can generate objects in a specific region by conditioning on the last heatmap of a semantic segmentation network
  • Synthesize a Music Video by sampling images conditioned on lyrics

Music video idea is interesting. Could even just be a video/image search engine and auto generate a slideshow based on text (a fiction book, business document, movie script, etc.)

2 Likes

@thunderingtyphoons @jeremy I’ve been playing around with photorealism style transfer and I think that can add to the denoising idea you mentioned above. See images below.

left: input photo
right: photorealisitic style transfer

4 Likes

Thank you for sharing @brendan, these resource are fantastic!

I have been wondering if you could take slightly out of focus photos and turn then into sharper images. I know from recently looking at old family photos that plenty of them suffer from being taken on cameras with bad quality focus and are only just out of focus. It seems that this should be reasonably easy to make datasets by adding some blur etc.

This (and denoising + SR) could also be really interesting to apply to audio to get clearer audio for phones and hearing aids etc.

Yes exactly! :slight_smile:

Wow @xinxin.li.seattle tell us more about what you did here! This looks fascinating. Especially the bottom pair - you’ve taken a washed-out image and made it much richer.

Apologies for my sloppiness, the washed-out image was actually the output of photorealistic style transfer. It would have been nice if it were the other way around :wink:

@brendan @Matthew This is all super interesting and something I’ve been thinking a lot about as well. If you guys ever wanted to hack on something along these lines, I’d love to join.

That sounds great. I’ve been playing around with applying vanilla WGAN to generating anime sketches. Not working so far, but maybe this new paper from the author of WGAN may help.

Improved Training of Wasserstein GANs


3 Likes

Cool, I’ll find you after the class.

Video dataset with scene descriptions
https://mila.umontreal.ca/en/publications/public-datasets/m-vad/

Interesting article that divides image generation into separate tasks
Generative Image Modeling using Style and Structure Adversarial Networks
https://arxiv.org/abs/1603.05631

I’m also wondering if generating photorealistic images from random noise is asking a bit much from a neural network. Pixel tweaking makes for nice gradients, but can we accelerate the processing by giving the network a paintbrush? A set of higher level tools to play with - pre-made shapes, semantic segmentations, 3d objects, unity game engine …

Interesting new work on photorealistic face generation with GANs.

2 Likes

wow! Incredible find @amduser! The blog is a good read, a nice interpretation of the original paper.
The paper is very well written on the first glance, complemented with failed attempts (which shows they are really confident about their work. )

[a tensorflow implemetation] (https://github.com/xxlatgh/BEGAN-tensorflow)

1 Like

So how about we create a short film and submit it to this competition?
https://att.submittable.com/submit/82003/emerging-indie-filmmakers

Film submission is due May 26. Finals at Warner Bros. Studios in Los Angeles, July 14–15, 2017. If we go in with a good tech stack maybe Warner Bros will buy us :wink:

Here are some other “experimental” film festivals:
http://expcinema.org/site/en/calls-entries

1 Like

@jeremy Can you elaborate on why you think Google Cloud ML would be a short cut as opposed to our own DL server? I looked into it and found Google Cloud SDK kinda hard to work with. Although, It seems to be delivering a more economical service than AWS.

A related question, won’t super resolution be super easy for Google/Facebook/Photoshop to integrate into their existing products? Given their existing user base + training data, how could we possibly compete?

Great idea! Do you have any particular theme that you are passionate about?

My immediate thought is a revamp on the old silent film. It’s convenient because it’s visual only, you don’t have to worry about creating dialogue/narrative.

If you’ve seen the movie Hugo, they made reference to Georges Méliès’s films, which has roots in magic tricks. Personally I feel it could blend well with deep learning.

It just so I came across this thing called deepwarp, which allow you to create image of anyone rolling their eyes. It’s quite fun, and I can image spinning that on this following film.

We can add style transfer, playing around with time and space, really bring this old film to life… of course there is super resolution, and all kinds of other fun stuff.

Just some quick thoughts and hope that stimulates the conversation. I’m absolutely open to all kinds of ideas.

3 Likes

I think it’s a brilliant idea and definitely feasible. I’m keen to explore it and the idea of enhancing/augmenting existing content just like we did with the cat.

I’m also exploring using video game animations as input and generating photorealistic or painting stylized output via CycleGAN. So take Grand Theft Auto (which OpenAI Universe has an API for) and apply the “monet2image” technique outlined in the CycleGAN paper. See their horse to zebra example. Pretty smooth video-based transfer.

Our “Paintbrush” can be pre-designed 3D Unity objects and animation sequences (there are tons in the Unity assets store which our algorithm can be trained to generate. Instead of generating raw pixels, we train the model to map text descriptions to 3d characters and to generate commands like “left”, “right”, “up”, “jump” etc.

We can pass these semi-realistic 3D generations into a conditional GAN like CycleGAN, to bring them to life.

Definitely far out, but I think with shortcuts + backdoors we can get something cool working that’s mostly AI generated. I’m going to play around with Cycle GAN painting --> image tonight and see how it works.

2 Likes