Lesson 10 Discussion

jeremy · March 20, 2017, 11:42pm

Checking a validation image works fine for me on the SR model trained on the whole training set - in general, poorer results on the validation set mean either you’ve overfit, or the validation set has data in a different form to the train set (e.g. black borders vs none). Here’s an example 288x288 image:

And after SR-net:

rkoppula · March 21, 2017, 12:33am

Thanks for testing! I will monitor validation accuracy to check for overfitting. I couldn’t do it last time because I used train_on_batch().

rkoppula · March 21, 2017, 12:37am

yeah, seems similar. Looks like you trained on imagenet for two epochs. Can’t imagine it’s overfitting already. Could be underfitting too.

jeremy · March 21, 2017, 12:49am

That wouldn’t really explain the good training set performance.

rkoppula · March 21, 2017, 1:34am

ahh got it.

dzhwoo · March 21, 2017, 3:31am

wondering if anyone has experimented with using tensorboard? was thinking of using tensorboard to visualize architecture of network. was hoping of visualize the network it would be easier to debug or tune.

Surya501 · March 21, 2017, 6:36am

I’ve tried using it for visualizing embedding. It’s pretty straightforward. You can see it in the gist I posted on this forum(on phone now). I can help with that part if you have any issues. I’d like to visualize network topologies too. Please share if you figure it out. I haven’t tried that part yet.

iNLyze · March 21, 2017, 12:30pm

I was finally able to figure it out myself. So, here is what you should do:
Generator: mse (though I had more success with crossentropy, don’t know why)
Discriminator: Wasserstein loss
Full model: Wasserstein loss

With Wasserstein loss being:

    return K.mean(y_true * y_pred)```

After 5000 training iterations of the generator this will give you something like:
<img src="/uploads/default/original/2X/c/c0e32d405544c67809ac60dab8af2a100e5e8769.png" width="517" height="402">

OK, it is just MNIST, but its a start.

EDIT: I wish to point out that this is part of what group7 is working on. In particular I'd like to acknowledge @samwit for suggesting the project and @rodgzilla for debugging and helpful links

jeremy · March 21, 2017, 5:39pm

Nice job. How did you handle the weight clipping?

iNLyze · March 21, 2017, 8:32pm

I uploaded the notebook to bitbucket. However, it is a private repo. I’ll be happy to give anyone taking the course access to it. @jeremy, this is also a general question - at what point is it OK to put code on public repos for as long as part2 is not open sourced yet? Originally the code is based on your implementation, though I made numerous additions and changes. I tried to understand your WGAN implementation in pytorch by “backporting” it to Keras. Not that I don’t like pytorch, I see its potential, but I am not productive with it yet and thought it might be a nice excercise to understand how your implementation works.
As for weight clipping: @rodgzilla pointed out this repo, where Thibault de Boissiere has implemented different GANs and other cool stuff in Keras. Btw - DenseNet is among them.
I used his weight clipping strategy:

def clip_weights(net, clipvalue):
    for l in net.layers:
        weights = l.get_weights()
        weights = [np.clip(w, -1*clipvalue, clipvalue) for w in weights]
        l.set_weights(weights)

Sorry, tldr. @jeremy with your permission I would make the repo public so as to make it easier for other students to look at it.

jeremy · March 21, 2017, 10:20pm

Please go ahead - and thanks for checking. Just don’t publicise it outside of the course.

jeremy · March 21, 2017, 10:21pm

I think that’s a terrific idea. Porting to another library or language is a terrific way to test and deepen your understanding of a topic.

iNLyze · March 21, 2017, 11:51pm

Did it. It is publicly accessible now.

samwit · March 21, 2017, 11:56pm

@iNLyze This looks good but I wonder if you still have Mode Collapse as the network seems to be clearly favoring 0 and 8 based on this plot. The Gen network seems to have learned that it can trick the Discriminator by just producing mostly 0 and 8.

What happens as you run the network longer than 5k iterations?

samwit · March 22, 2017, 12:20am

@jeremy I have just gotten back from travel, so still have to go through the GAN lecture properly, but I noticed in your code you don’t seem to use any special weight initialization. Is this something you have looked into?

I haven’t had a chance to code up WGAN yet, but using an original GAN with MLP I have found that one big factor in preventing mode collapse too quickly was how you init the weights at the start of the network.

I have found using Xavier Initialization or He Init to give a much more stable network and better output results.

Curious to hear your thoughts on different types of initializations.

iNLyze · March 22, 2017, 12:43am

I did a few more predicts() and could identify most numbers. Didn’t see any 4’s yet. And there might be a bias towards 8’s and 0’s. But that might be because I was looking at a small sample only.
I’ll let it run again overnight to see more than 5k iterations.

brendan · March 22, 2017, 1:40am

I didn’t see the speedups with my image resize code, but it looks like I need to find a better example. I ran the pillow-simd performance benchmark suite and confirmed that I AM seeing the speedups they mention in their docs.

dzhwoo · March 22, 2017, 4:17am

wondering how to think about the different GAN networks? i.e DCGAN vs WGAN. is one always better than the other or does it depend on the problem/data you are working on?

iNLyze · March 22, 2017, 12:06pm

Actually, more iterations are not needed. In my implementation about 2000 iterations are sufficient for D and G to reach equilibrium.

jeremy · March 22, 2017, 6:21pm

No I didn’t - frankly I didn’t spend that much time on pre-Wasserstein GAN since they’re obsoleted now.