I’m hoping there’ll be a chance to cover this on Monday since we didn’t get to it last week. This is one of the key missing pieces for me in terms of scaling training.
Yes we will. Sorry about the delay!
No worries! I’m always amazed by how much we cover, and that there’s the chance to ask questions and interact along the way. I’m also very grateful for all of the help you’ve provided. I’m sad the course is ending, although I’ve got a page worth of possible projects that should keep me busy until next year.
Did you make any headway? I’d be very interested in an update
I’ve tried this with 0.4 - installed using conda install, and it still fails. Jeremy must have been compiling from source when he was experimenting with 0.4, so is it possible that the in-place worked for him because of that - that the code was specifically compiled for his GPU?
Interesting. I still haven’t gotten to updating to 0.4, so thanks for the heads up.
I’ll definitely share the update as soon as I’m done. I’m currently stretching a lot at work and not finding much time. Sorry for the delay.
In the meantime I can share a new thing I came across
It can help us learn and create at the same time. You can play around and share something new you come up with.
Yeah… I’ve seen that.
It looks really nice, visually.
Unfortunately it’s only for images and raw vector data, so you’d need python to do preprocessing on data to begin with and I’m not convinced losing pytorch, in addition to not being able to use your own hardware would be worth it putting everything else aside.
Looks great for beginners wanting to do deep learning for images though!
Yep, which also means we have to be careful to denormalize our generated data accordingly.
I also tried to find the reason why using tanh instead of sigmoid, but no success yet (the DCGAN paper you refer to just mention that bounded activations are better, but it’s also the case if we use the sigmoid which is bounded between 0 and 1).
The only reason I can think of is that it works better experimentally on some dataset, maybe because the activations are close to zero, which means that the sigmoid will get saturated, which won’t be the case for tanh as it is define between -1 and 1.
Is there any particular reason why dropout is not used in the Darknet notebook? The training loss is significantly lower than the validation loss at the end, so I would expect dropout to help…
More generally, are there cases where dropout shouldn’t be used? I thought that if we take a network without dropout, make each layer x% wider, then add 1/(1+x%) dropout probability, we should get better results. Is this true?
My intuition would be, because batchnorm is used, and it requires some additional work to make it work together.
But people had some success using dropout with batchnorm on Cifar10
This is a paper discusing it:
Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift
As @jeremy said, there is a room for experimentation in deep learning
I am trying to understand cycle gan, and reading the source code. I dont understand why their generator model has so many of these layers (see bottom) compare to the one we got in wgan. I also couldn’t find things about this on their paper. Does anyone know??
Input size is one factor from the loop, but the layers are more than I would expect.
ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) )
Is this generator model code more similar to super resolution model we see in lesson 13 as input and output of the model is also an image.