Part 2 Lesson 12 wiki

Do cycle Gans strictly need two discriminators? Could you build a cycle Gan using a pretrained classifier as your discriminator?

1 Like

Has anyone successfully applied the GAN code to grayscale images (single channel)?

@jeremy I have a question about mean and std in couple lines of code. You said not to use loops, but what about list comprehensions? They are still loops for me :slight_smile:

train_imgs = np.array([(cv2.imread(str(fn), -1)[:, :, [2, 1, 0]] / 255) for cls_n in (PATH/'train').iterdir() for fn in cls_n.iterdir()])
train_imgs.mean(axis=(0, 1, 2)), train_imgs.std(axis=(0, 1, 2))
1 Like

I’ve been trying to implement the wgan on the celeb faces dataset but I’m running into some speed issues.

With a batch size of 128, one epoch takes 3:49. With a batch size of 1028, one epoch takes 3:32. I think something is getting bottlenecked somewhere but I don’t know what.

Speed issues seem to be related to loading the data. The model will run 8 iterations really fast (8 being the number of workers), then hang, then do another 8 really fast. Changing the number of workers in the dataloader doesn’t affect epoch time, but it does affect how many fast iterations happen before the next hang.

I’ve attached a typical prun output. If anyone has ideas on how to speed things up they would be appreciated. From what I’ve read it seems 2000 or so epochs on this dataset are needed for good results, which at current speeds would be almost 5 continuous days of training.

@Chris_Palmer Did you ever figure this out? I’m having the same issue…

It’s a long time since I was in this code and I cannot remember precisely what I found out in the end, sorry… Can you point me to the precise point where you are having difficulties and I will see if I have made any notes in my version of the notebook.

For Lesson 12, in the CIFAR10-Darknet notebook, every line runs fine until I go to fit the custom built architecture which then yields the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Thanks @airborneinf82 . That helped me refresh my memory, and now I remember that I did not get a resolution to it, apart from not using the inplace operation, as discussed in Kens post

Ahh thank you! I totally over looked that post! I will give that a try here in a bit.

Hello all and @jeremy

I’m kind of stuck on my project, I’m reaching 80% accuracy but I think I can do better, my data is unbalanced, I would really like to try a GAN to augment my data, I’m almost there. I have followed Lesson 12 but I have 3 blocking points in each of these posts of mine:

If someone could guide me by replying into the specific thread, thanks a lot for your help.

That was the trick! Fantastic, thanks. Can’t believe I over looked that post!

Great - glad to have been helpful! :wink:

To save memory; he answers this in the video.

I am unable to run the cifar10-darknet .ipnb fully on my local eGpu ( TitanXP-- Macos 10.13.6)

After 2-3 rounds of training data my GPU heated up and shut down.
I have reduced the batch size(64) as well but that helped also.
Since the image size is 32*32, So i thought it should work, but not able to build fully.
Can someone help me on this.
Looks like only option left is that i need to go to the AWS or Google.
Any suggestion or help please.

@Even Well said. Your words are practical and motivating. Thank you.

1 Like

I was tinkering with the wgan notbook and decided to try not training the discriminator more times (5X and occasionally 100X) than the generator. So I changed the following in train(train(niter, first=True))

def train(niter, first=True):

                #d_iters = 100 if (first and (gen_iterations < 25) or (gen_iterations % 500 == 0)) else 5 
                d_iters = 1 # training ratio of discriminator: generator is 1:1

        print(f'Loss_D {to_np(lossD)}; Loss_G {to_np(lossG)}; '
              f'D_real {to_np(real_loss)}; Loss_D_fake {to_np(fake_loss)}')

It seems to train the WGAN faster. In the first 5 iterations one can get quite respectable fake images.

train(5, False)

Anyone knows if doing this will lead to worse mode collapse or memorization or whatever GAN problems that Ian Goodfellow admonishes about?

Edit: unfortunately on celebA dataset the glaring deficiency shows up quite starkly at 10 iterations.

train(5, False) 
set_trainable(netD, True)
set_trainable(netG, True)
optimizerD = optim.RMSprop(netD.parameters(), lr = 1e-5)
optimizerG = optim.RMSprop(netG.parameters(), lr = 1e-5)
train(5, False)

Thanks for that but I am having a new error.

TypeError: No loop matching the specified signature and casting was found for ufunc add

Then I set learn.metrics = []
and everything works fine.

I also saw someone using pytorch 0.4 will work too

In cyclegan notebook, I got stuck at optimization process saying following error:

RuntimeError: cuda runtime error (2) : out of memory at c:\anaconda2\conda-bld\pytorch_1519501749874\work\torch\lib\thc\generic/

When I met this error, I followed the theory of decreasing batch-size and it worked.
However, bs is already 1 in this case, and I met same error.
Is there anything to make it work?

after setting learn.metrics=[] the issue solved for me too, but can you explain what was the reason for solving that particular issue when defined metrics?
cant find the reason

This could solve the error but what if we want to see the accuracy. which is not possible if we define the metrics as null list. if we cant find the accuracy how could we say that we have reached at the best for our model.