my understanding is that, the 32x32 images would have been transformed images from some other higher size, any more complicated augmentations would lead to loss of image context.
I don’t think it is about location of objects in the image, since this is not a object detection or localisation problem.
If u have good 32x32 images, may be we can apply more transforms. Its about what we might loose by applying transforms. My understanding completely stands on the problem not being a detection/localization problem, and hence we can apply rotation/flips on good images of 32x32 size.
I try to spend enogh time to ‘understand’ then implement each new thing, but my implementations are never really deep enogh, because writing and debugging good code takes days or weeks, not hours. Last week for example I spent Mon-Fri on the translate, and only Sat-mon on Devise. When it was suggested to use ‘Beam Search’ on translate, I ran out of time before I could explore it, so I ‘m working on that Today. So far I’m spending on average 45-50 hours per week and don’t feel that I’m keeping up. This course is a firehose of information. I understand that trying to cover all this stuff in 7 weeks is at best difficult and at worst impossible so @jeremy is doing the best he can with limited resouces timewise. But the good news is you can always refer to the videos and the forums for help.
In case you want to download Jeremy’s 20% sample lsun dataset from kaggle on Google colab and wondering how to use kaggle api on colab this is a nice post to refer
Hopefully there’s enough material to keep you all amused until next year’s course I don’t expect anyone to master all the material in these 7 weeks - just try to pick up the key bits each week and hopefully find some time to dig a bit more into one or two key areas that interest you now.
You might want to make that clear in your intro so that people who don’t have a study group don’t get discouraged and think they aren’t cut out for this.
I’ve been playing with GANs in applications to Internet traffic generation. But, I have a more general question. Is there any known approach to build a binary classifier to detect if an entry (e.g., audio, video, image) is real or if it comes from a GAN (fake)? For instance, how to detect if the bedroom image came from the DCGAN? Or if a certain audio is really Obama’s voice?
I got the answer for the tanh question of yesterday for the last activation of the DCGAN generator.
As usual, our images have been normalized, so their pixel values don’t go between 0 and 1 anymore. This is why we want values going from -1 to 1 otherwise we wouldn’t give a correct input for the discriminator.
If they’re normalized to have mean zero and std one, then tanh won’t be ideal, because it can’t create outputs with abs value >1. Wonder if we need to look at this again…
In the DCGAN paper they say they normalize their images to have a range of pixel from -1 to 1. I’ll try that sometimes this week to see if this yields better results.
Also, I believe there’s a small bug in the WGAN notebook, when we define the ConvBlock and the DeconvBlock: the parameter bn is never used to test if we should add it or not. But then bn=False is used twice (at the places indicated in the DCGAN article).
Don’t know if I should do a PR or if you plan to change this notebook soon, Jeremy.
I’m so biased bc GANs and creativeAI is like my favorite, even though all things CV (lectures heavy on the MOOC) are like gateway drug for me to get more into AI (like all of CS231N and image segmentation by Andrej Karpathy.
Hey, has anyone downloaded the kaggle dataset on their cloud machine? I’m trying to use the kaggle-cli but I’m coming up empty for some reason. Specifically, I’ve done kg dataset -o jhoward -d lsun_bedroom, which according to the docs, should work in this case. I’m logged in through kg’s global config, but when I run the above command, it pretty much immediately finishes and nothing happens. No message. No files in my folder. I’m not really sure what to do. I could try to curl it I suppose, and add authentication through curl, which is kind of a hassle, but how have other people done this?