Lesson 10 Discussion


(Jeremy Howard) #61

Checking a validation image works fine for me on the SR model trained on the whole training set - in general, poorer results on the validation set mean either you’ve overfit, or the validation set has data in a different form to the train set (e.g. black borders vs none). Here’s an example 288x288 image:

And after SR-net:


(rajendra koppula) #62

Thanks for testing! I will monitor validation accuracy to check for overfitting. I couldn’t do it last time because I used train_on_batch().


(rajendra koppula) #63

yeah, seems similar. Looks like you trained on imagenet for two epochs. Can’t imagine it’s overfitting already. Could be underfitting too. :slight_smile:


(Jeremy Howard) #64

That wouldn’t really explain the good training set performance.


(rajendra koppula) #65

ahh got it.


(David Woo) #66

wondering if anyone has experimented with using tensorboard? was thinking of using tensorboard to visualize architecture of network. was hoping of visualize the network it would be easier to debug or tune.


(Suresh ) #67

I’ve tried using it for visualizing embedding. It’s pretty straightforward. You can see it in the gist I posted on this forum(on phone now). I can help with that part if you have any issues. I’d like to visualize network topologies too. Please share if you figure it out. I haven’t tried that part yet.


(Constantin) #68

I was finally able to figure it out myself. So, here is what you should do:
Generator: mse (though I had more success with crossentropy, don’t know why)
Discriminator: Wasserstein loss
Full model: Wasserstein loss

With Wasserstein loss being:

    return K.mean(y_true * y_pred)```

After 5000 training iterations of the generator this will give you something like:
<img src="/uploads/default/original/2X/c/c0e32d405544c67809ac60dab8af2a100e5e8769.png" width="517" height="402">

OK, it is just MNIST, but its a start.

EDIT: I wish to point out that this is part of what group7 is working on. In particular I'd like to acknowledge @samwit for suggesting the project and @rodgzilla for debugging and helpful links

(Jeremy Howard) #69

Nice job. How did you handle the weight clipping?


(Constantin) #70

I uploaded the notebook to bitbucket. However, it is a private repo. I’ll be happy to give anyone taking the course access to it. @jeremy, this is also a general question - at what point is it OK to put code on public repos for as long as part2 is not open sourced yet? Originally the code is based on your implementation, though I made numerous additions and changes. I tried to understand your WGAN implementation in pytorch by “backporting” it to Keras. Not that I don’t like pytorch, I see its potential, but I am not productive with it yet and thought it might be a nice excercise to understand how your implementation works.
As for weight clipping: @rodgzilla pointed out this repo, where Thibault de Boissiere has implemented different GANs and other cool stuff in Keras. Btw - DenseNet is among them.
I used his weight clipping strategy:

def clip_weights(net, clipvalue):
    for l in net.layers:
        weights = l.get_weights()
        weights = [np.clip(w, -1*clipvalue, clipvalue) for w in weights]
        l.set_weights(weights)    

Sorry, tldr. @jeremy with your permission I would make the repo public so as to make it easier for other students to look at it.


(Jeremy Howard) #71

Please go ahead - and thanks for checking. Just don’t publicise it outside of the course.


(Jeremy Howard) #72

I think that’s a terrific idea. Porting to another library or language is a terrific way to test and deepen your understanding of a topic.


(Constantin) #73

Did it. It is publicly accessible now.


(Sam Witteveen) #74

@iNLyze This looks good but I wonder if you still have Mode Collapse as the network seems to be clearly favoring 0 and 8 based on this plot. The Gen network seems to have learned that it can trick the Discriminator by just producing mostly 0 and 8.

What happens as you run the network longer than 5k iterations?


(Sam Witteveen) #75

@jeremy I have just gotten back from travel, so still have to go through the GAN lecture properly, but I noticed in your code you don’t seem to use any special weight initialization. Is this something you have looked into?

I haven’t had a chance to code up WGAN yet, but using an original GAN with MLP I have found that one big factor in preventing mode collapse too quickly was how you init the weights at the start of the network.

I have found using Xavier Initialization or He Init to give a much more stable network and better output results.

Curious to hear your thoughts on different types of initializations.


(Constantin) #76

I did a few more predicts() and could identify most numbers. Didn’t see any 4’s yet. And there might be a bias towards 8’s and 0’s. But that might be because I was looking at a small sample only.
I’ll let it run again overnight to see more than 5k iterations.


(Brendan Fortuner) #77

I didn’t see the speedups with my image resize code, but it looks like I need to find a better example. I ran the pillow-simd performance benchmark suite and confirmed that I AM seeing the speedups they mention in their docs.


(David Woo) #78

wondering how to think about the different GAN networks? i.e DCGAN vs WGAN. is one always better than the other or does it depend on the problem/data you are working on?


(Constantin) #79

Actually, more iterations are not needed. In my implementation about 2000 iterations are sufficient for D and G to reach equilibrium.


(Jeremy Howard) #80

No I didn’t - frankly I didn’t spend that much time on pre-Wasserstein GAN since they’re obsoleted now.