Lesson 9 official topic

gstaff · October 11, 2022, 12:06am

One course content question: Will there be some time devoted to discussing performance optimizations?

It looks like various techniques have been used to reduce the required VRAM for training DreamBooth from 24 GB to under 8 GB! Techniques like these and e.g. gradient accumulation discussed in part one of the course could make the difference in running on consumer hardware.

johnrobinsn · October 11, 2022, 12:18am

If time permits would also be very interested in learning more about how to apply sequence models to video (or ct scan slices)… maybe going from resnet activations on individual images/frames to an lstm (or transformer) for overall sequence classification etc in a “graceful” way using fastai (if that’s the best way).

Thanks!

init_27 · October 11, 2022, 8:07am

We hear you loud and clear, Jeremy

neuraloverflow · October 11, 2022, 8:15am

Awesome!

suvash · October 11, 2022, 9:07am

That sure looks like a lot of fun !

bernhard_jung · October 11, 2022, 10:29am

For those playing around with the stable_diffusion.ipynb notebook in diffusion-nbs repo running into

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

when running pipei2i:

You need a diffusers version higher than 0.4.1, because this patch is needed for fp16 to work.

KevinB · October 11, 2022, 10:38am

Why do we try to draw the noise rather than go straight to drawing the digit itself? My thought is that our end goal is to draw the digits in this case, but I’m not quite understanding why we try to draw the noise as our model output rather than just drawing the digit directly

miko · October 11, 2022, 10:41am

I was wondering whether it would be possible to use this result on zero-shot latent stitching to speed up the CLIP part of the model…

ganesh.bhat · October 11, 2022, 10:42am

Hi,

I keep getting the below error while running the notebook on colab (free version) with GPU runtime. Any suggestion on how to resolve this?

OSError: There was a specific connection error when trying to load CompVis/stable-diffusion-v1-4:
<class 'requests.exceptions.HTTPError'> (Request ID: o52_DNplzfZM55fVguDXA)

Thanks in advance,
Ganesh

johnowhitaker · October 11, 2022, 10:43am

You need to log into a huggingface account and accept the licence terms before you can download stable diffusion (it hasa special licence)

ganesh.bhat · October 11, 2022, 10:44am

Any specific URL I need to visit? I have logged in and generated a token which was passed to the notebook_login() code.

miko · October 11, 2022, 10:45am

Thanks for this. Although that alone did not work for me. I had to update Transformers as well.

jeremy · October 11, 2022, 10:45am

Click the link for the model in the notebook.

ganesh.bhat · October 11, 2022, 10:46am

Yes, I searched for the model on hugging face search bar and accepted their license on this page - CompVis/stable-diffusion-v1-4 · Hugging Face.

johnowhitaker · October 11, 2022, 10:49am

Either works. Predicting the noise (or some scaled version) is convenient for some mathy formulations, and some people hand-wave about it being easier to get NNs to output zero-mean gaussians (but I’m sceptical about that justification ;). If you have the noise then it tells you the ‘direction’ you need to edit your noisy image which is what we want for the sampling step, so that tends to be the convention, at least for now.

pereira.lucas · October 11, 2022, 11:07am

The suggested explanation is because we are trying to learn a NN as an optimizer, that is, a function that can estimate the gradients from the noisy images (just like Jeremy showed in today’s lesson in the handwritten digit motivation).
The actual explanation is: it seems to work a lot better. NN that are big enough can learn pretty much anything, so it could be that predicting the noise is just a better regularization method that is helping these networks to learn better latent representations and not overfit as much as GANs or other generative methods.

Although, to be clear, as Jeremy pointed out in the end of the lesson, developing the view of learning a NN as an optimizer seems to be paying off with incredible results in recent papers, so it’s important to keep it in mind.
That’s how I understant things so far, at least. I’m open to corrections!

suvash · October 11, 2022, 11:10am

Thanks again for a great lecture. Looking forward to the ‘from scratch’ in coming weeks.

rajmittal09 · October 11, 2022, 11:19am

@jeremy can you please let us know when should we expect the other lecture support material covering the math apart from watching the in-depth video from @johnowhitaker? Also, is there going to help/support if something is not clear from the math bit?

averma · October 11, 2022, 11:41am

This was an awesome lesson. Really broke down the individual components of stable diffusion and what we are aiming at understanding. Really made stable diffusion uncool, fast.ai style

wyquek · October 11, 2022, 12:19pm

According to the authors of “Denoising Diffusion Probabilistic Models”, the reason is “Ultimately, our model design is justified by simplicity and empirical results”, so it’s likely that this just works better (as at yesterday)