Diffusers 'fp16' revision

dpoulopoulos · November 22, 2022, 8:39am

I’m loading the CompVis/stable-diffusion-v1-4 model using the Diffusers library like this:

pipeline = StableDiffusionPipeline.from_pretrained(model_name, revision="fp16", torch_dtype=torch.float16).to("cuda")

The memory footprint of the model on the GPU is something like 3.3GB. Then, I load the same model with the following code:

pipeline = StableDiffusionPipeline.from_pretrained(model_name).to("cuda")

As expected, the memory footprint is now double the size (~6.5GB). However, when I load the model using the code below, I still get a memory footprint of around 3.3GB and no errors in the image generation process:

pipeline = StableDiffusionPipeline.from_pretrained(model_name, torch_dtype=torch.float16).to("cuda")

My question is, why do we need to download the model from the fp16 revision?

fredguth · November 23, 2022, 6:20pm

This model will use floats with half of the usual 32 bit precision, i.e 16bit float point precision. Thus it will fit in less GPU memory.

dpoulopoulos · November 23, 2022, 6:34pm

Yes, I know. They will also run a bit faster. My question is, why do I need to set the revision to fp16 and not just specify the torch_dtype attribute to be torch.float16, like I do in the last code snippet? What’s special with the fp16 revision?

fredguth · November 23, 2022, 6:58pm

This is the model you are downloading. If you do not specify fp16 it will download the model with precision fp32.

dpoulopoulos · November 24, 2022, 4:08am

Sure, but if I then specify the torch_dtype attribute to be torch.float16 I’m overriding the default dtype of the model to be 16-bit precision floats. At least this is what the docs say:

torch_dtype (str or torch.dtype, optional) — Override the default torch.dtype and load the model under this dtype. If "auto" is passed the dtype will be automatically derived from the model’s weights.

fredguth · November 24, 2022, 3:43pm

If you do that you are still downloading a model that is double the size of what you needed. With models that can take several Gb this makes a difference.