One course content question: Will there be some time devoted to discussing performance optimizations?
It looks like various techniques have been used to reduce the required VRAM for training DreamBooth from 24 GB to under 8 GB! Techniques like these and e.g. gradient accumulation discussed in part one of the course could make the difference in running on consumer hardware.
If time permits would also be very interested in learning more about how to apply sequence models to video (or ct scan slices)… maybe going from resnet activations on individual images/frames to an lstm (or transformer) for overall sequence classification etc in a “graceful” way using fastai (if that’s the best way).
We hear you loud and clear, Jeremy
That sure looks like a lot of fun !
For those playing around with the stable_diffusion.ipynb notebook in diffusion-nbs repo running into
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
when running pipei2i:
You need a diffusers version higher than 0.4.1, because this patch is needed for fp16 to work.
Why do we try to draw the noise rather than go straight to drawing the digit itself? My thought is that our end goal is to draw the digits in this case, but I’m not quite understanding why we try to draw the noise as our model output rather than just drawing the digit directly
I was wondering whether it would be possible to use this result on zero-shot latent stitching to speed up the CLIP part of the model…
I keep getting the below error while running the notebook on colab (free version) with GPU runtime. Any suggestion on how to resolve this?
OSError: There was a specific connection error when trying to load CompVis/stable-diffusion-v1-4:
<class 'requests.exceptions.HTTPError'> (Request ID: o52_DNplzfZM55fVguDXA)
Thanks in advance,
You need to log into a huggingface account and accept the licence terms before you can download stable diffusion (it hasa special licence)
Any specific URL I need to visit? I have logged in and generated a token which was passed to the notebook_login() code.
Thanks for this. Although that alone did not work for me. I had to update Transformers as well.
Click the link for the model in the notebook.
Yes, I searched for the model on hugging face search bar and accepted their license on this page - CompVis/stable-diffusion-v1-4 · Hugging Face.
Either works. Predicting the noise (or some scaled version) is convenient for some mathy formulations, and some people hand-wave about it being easier to get NNs to output zero-mean gaussians (but I’m sceptical about that justification ;). If you have the noise then it tells you the ‘direction’ you need to edit your noisy image which is what we want for the sampling step, so that tends to be the convention, at least for now.
The suggested explanation is because we are trying to learn a NN as an optimizer, that is, a function that can estimate the gradients from the noisy images (just like Jeremy showed in today’s lesson in the handwritten digit motivation).
The actual explanation is: it seems to work a lot better. NN that are big enough can learn pretty much anything, so it could be that predicting the noise is just a better regularization method that is helping these networks to learn better latent representations and not overfit as much as GANs or other generative methods.
Although, to be clear, as Jeremy pointed out in the end of the lesson, developing the view of learning a NN as an optimizer seems to be paying off with incredible results in recent papers, so it’s important to keep it in mind.
That’s how I understant things so far, at least. I’m open to corrections!
Thanks again for a great lecture. Looking forward to the ‘from scratch’ in coming weeks.
@jeremy can you please let us know when should we expect the other lecture support material covering the math apart from watching the in-depth video from @johnowhitaker? Also, is there going to help/support if something is not clear from the math bit?
This was an awesome lesson. Really broke down the individual components of stable diffusion and what we are aiming at understanding. Really made stable diffusion uncool, fast.ai style
According to the authors of “Denoising Diffusion Probabilistic Models”, the reason is “Ultimately, our model design is justified by simplicity and empirical results”, so it’s likely that this just works better (as at yesterday)