Part 2 Lesson 12 wiki

(Chris Palmer) #234

It’s a long time since I was in this code and I cannot remember precisely what I found out in the end, sorry… Can you point me to the precise point where you are having difficulties and I will see if I have made any notes in my version of the notebook.

(Eddie) #235

For Lesson 12, in the CIFAR10-Darknet notebook, every line runs fine until I go to fit the custom built architecture which then yields the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

(Chris Palmer) #236

Thanks @airborneinf82 . That helped me refresh my memory, and now I remember that I did not get a resolution to it, apart from not using the inplace operation, as discussed in Kens post

(Eddie) #237

Ahh thank you! I totally over looked that post! I will give that a try here in a bit.

(Hugues) #238

Hello all and @jeremy

I’m kind of stuck on my project, I’m reaching 80% accuracy but I think I can do better, my data is unbalanced, I would really like to try a GAN to augment my data, I’m almost there. I have followed Lesson 12 but I have 3 blocking points in each of these posts of mine:

If someone could guide me by replying into the specific thread, thanks a lot for your help.

(Eddie) #239

That was the trick! Fantastic, thanks. Can’t believe I over looked that post!

(Chris Palmer) #240

Great - glad to have been helpful! :wink:

(Bilal) #241

To save memory; he answers this in the video.

(Vijay Kumar) #242

I am unable to run the cifar10-darknet .ipnb fully on my local eGpu ( TitanXP-- Macos 10.13.6)

After 2-3 rounds of training data my GPU heated up and shut down.
I have reduced the batch size(64) as well but that helped also.
Since the image size is 32*32, So i thought it should work, but not able to build fully.
Can someone help me on this.
Looks like only option left is that i need to go to the AWS or Google.
Any suggestion or help please.

(Vishnu Kumar Kailash Kumar) #243

@Even Well said. Your words are practical and motivating. Thank you.

(魏璎珞) #244

I was tinkering with the wgan notbook and decided to try not training the discriminator more times (5X and occasionally 100X) than the generator. So I changed the following in train(train(niter, first=True))

def train(niter, first=True):

                #d_iters = 100 if (first and (gen_iterations < 25) or (gen_iterations % 500 == 0)) else 5 
                d_iters = 1 # training ratio of discriminator: generator is 1:1

        print(f'Loss_D {to_np(lossD)}; Loss_G {to_np(lossG)}; '
              f'D_real {to_np(real_loss)}; Loss_D_fake {to_np(fake_loss)}')

It seems to train the WGAN faster. In the first 5 iterations one can get quite respectable fake images.

train(5, False)

Anyone knows if doing this will lead to worse mode collapse or memorization or whatever GAN problems that Ian Goodfellow admonishes about?

Edit: unfortunately on celebA dataset the glaring deficiency shows up quite starkly at 10 iterations.

train(5, False) 
set_trainable(netD, True)
set_trainable(netG, True)
optimizerD = optim.RMSprop(netD.parameters(), lr = 1e-5)
optimizerG = optim.RMSprop(netG.parameters(), lr = 1e-5)
train(5, False)

(Ben) #245

Thanks for that but I am having a new error.

TypeError: No loop matching the specified signature and casting was found for ufunc add

Then I set learn.metrics = []
and everything works fine.

I also saw someone using pytorch 0.4 will work too

(Tomoaki Ando) #246

In cyclegan notebook, I got stuck at optimization process saying following error:

RuntimeError: cuda runtime error (2) : out of memory at c:\anaconda2\conda-bld\pytorch_1519501749874\work\torch\lib\thc\generic/

When I met this error, I followed the theory of decreasing batch-size and it worked.
However, bs is already 1 in this case, and I met same error.
Is there anything to make it work?