Part 2 Lesson 12 wiki

After 200 epochs and 36 hours after

4 Likes

I have 6 GPUs on my PC
This is what is happening now (1 working and 5 doing nothing)
Is there any way to include them all in the process?

6 Likes

We’ll cover this tonight :slight_smile:

7 Likes

@jeremy @radek

Am I right with the points I mentioned here? I’m testing the iteration relationship myself but for the rest of it I seek your advice.

Sorry for the late reply.

Can you discuss your “super resolution” of sentence idea a little bit more?

I think jeremy will talk about this as he has mentioned in the comment.

Can you explain what effect using such a relationship of training iterations for Generator vs. Discriminator over training time would have?

I’m myself testing this to have concrete results but I strongly feel that the generator is trained with a decreasingly strict manner first and then increasingly strict manner next as the training ratios decreases and then increases and has a lowest of 1.

This peak nature is inspired from lecture 8 and the idea behind this to ease up generator training initially but make it hard eventually so that it learns better.

In case this is of interest, here are results of some experiments in Cycle GAN and art generations I ran: here


Thanks!
Alena

5 Likes

I wish I could like this 100 times! :smiley:

I don’t know - it sounds like an interesting direction to experiment in!

1 Like

Hi, I’ve been pondering how to combine image generators and pre-trained image models. Intuitively, it seems like you should somehow be able to use a pre-trained image model like Resnet or VGG to jump start a generator. As in, the pre-trained model already knows about edges, curves, and more abstract features. So shouldn’t we be able to take advantage of that?
Though when I get down to actually considering the implementation, things kinda fall apart. Going from a low-dimensional space (the output class, or output dimensions) to a higher dimensional space is tough. You can’t just “undo” a linear layer, especially with something like ReLU. And same with convolutions. There are many many input configurations that could have generated the outputs at any given layer of a model. So it seems like you’d explode into some insanely huge possible input space as you went backward from layer to layer. But still, the question remains… that it seems there should be some kind of way to extract the information known from the pre-trained model.
So I guess my question is… are there any ways to extract an approximation of the input space that, when given a set of transformations (the neural net), could have been used to generate a given output? If you could, I would think that you could sample from the known possible input space at each layer, and then go from there… any thoughts on this?

From experimenting I figured that Adam and WGANs not just work worse - it causes to completely fail to train meaningful generator.

from WGAN paper:

Finally, as a negative result, we report that WGAN training becomes unstable at
times when one uses a momentum based optimizer such as Adam [8] (with β1>0)
on the critic, or when one uses high learning rates. Since the loss for the critic is
nonstationary, momentum based methods seemed to perform worse. We identified
momentum as a potential cause because, as the loss blew up and samples got worse,
the cosine between the Adam step and the gradient usually turned negative. The
only places where this cosine was negative was in these situations of instability. We
therefore switched to RMSProp [21] which is known to perform well even on very
nonstationary problems

2 Likes

I’m hoping there’ll be a chance to cover this on Monday since we didn’t get to it last week. This is one of the key missing pieces for me in terms of scaling training.

1 Like

Yes we will. Sorry about the delay!

1 Like

No worries! I’m always amazed by how much we cover, and that there’s the chance to ask questions and interact along the way. I’m also very grateful for all of the help you’ve provided. I’m sad the course is ending, although I’ve got a page worth of possible projects that should keep me busy until next year.

1 Like

Did you make any headway? I’d be very interested in an update :slight_smile:

There was a paper at ICLR which talked about GAN convergence, and said that if you normalize the weights by dividing by the spectral norm, it converged a lot better.

3 Likes

I’ve tried this with 0.4 - installed using conda install, and it still fails. Jeremy must have been compiling from source when he was experimenting with 0.4, so is it possible that the in-place worked for him because of that - that the code was specifically compiled for his GPU?

Interesting. I still haven’t gotten to updating to 0.4, so thanks for the heads up.

I’ll definitely share the update as soon as I’m done. I’m currently stretching a lot at work and not finding much time. Sorry for the delay.

In the meantime I can share a new thing I came across

It can help us learn and create at the same time. You can play around and share something new you come up with.

Yeah… I’ve seen that.

It looks really nice, visually.

Unfortunately it’s only for images and raw vector data, so you’d need python to do preprocessing on data to begin with and I’m not convinced losing pytorch, in addition to not being able to use your own hardware would be worth it putting everything else aside.

Looks great for beginners wanting to do deep learning for images though!

Yep, which also means we have to be careful to denormalize our generated data accordingly.
I also tried to find the reason why using tanh instead of sigmoid, but no success yet (the DCGAN paper you refer to just mention that bounded activations are better, but it’s also the case if we use the sigmoid which is bounded between 0 and 1).
The only reason I can think of is that it works better experimentally on some dataset, maybe because the activations are close to zero, which means that the sigmoid will get saturated, which won’t be the case for tanh as it is define between -1 and 1.