Advice on New Project

I built a dog breed classifier which classifies a picture of a dog into the breeds from the Stanford’s Dog’s Dataset.
I want to add another feature that generates a picture of a dog of the breed it was classified using GANs.
Any idea how to approach it or do you have in mind some similar implementation?


GANs are potent tools for generative modelling, especially for a simple dataset like Stanford Dogs. The StyleGAN family of methods is the de facto standard model for typical generative tasks, and it is one of your best bets. However, GANs are notoriously difficult to train and have a wide-ranging set of issues like mode collapse, vanishing gradients, and failure to converge in general. Remedies, like Wasserstein distance, have been proposed, but they are not always effective. You will probably eventually reach your desired results, but it may take some hyperparameter tuning and patience.

There are also variational autoencoders (VAE), which have witnessed a Renaissance recently. Previously, their results were not as photo-realistic GANs, albeit they were more stable and diverse, and were thus seen as a more stable yet inaccurate alternative to GANs. Nowadays though, with the advent of better VAE architectures like VQ-VAE2, they are on par with GANs regarding output quality and are vastly easier to deal with. Personally, I prefer VAEs over GANs, but they can be combined for further gains, e.g., VQ-GAN.

Additionally, there are diffusion models, one of the trendiest topics in deep learning. They are stable and generate diverse, high-quality images, however, their chief drawback is their sampling speed. Specifically, their inference procedure is different and much slower than how they train, making them impractical for applications where data points must be produced on the fly fast. Nonetheless, there is research being done towards quicker sampling that is encouraging.

In short, pick diffusion models if sampling speed is not of consequence. This GitHub repository would be an excellent starting point since it is straightforward and flexible. Otherwise, you can try StyleGAN3, but diligence is essential to get it work; if it proves to be too cumbersome, switch to VQ-VAE2.

Is that helpful?


Hey, Sorry for the late reply.
And yes this stuff is really helpful.
I am considering starting with GANs and then i’ll migrate to VAEs and diffusion models.

1 Like

I am new to deep learning. I am currently learning courses on I learned after first few courses is that image classification is not hard to develop as I thought before. Here is the idea that comes up in my mind. I love Google Photos for photo storage and mostly for auto image classifications relating scences, people, etc. So I want to create my own application like Google Photos, but not for cloud storage, instead for local storage, as it can benefits from privacy concerns. My question is that, is there any software, which match my project idea, already developed for privacy issues addressed, for example, its open-source or it can run without internet?

You might like this project for image search.