Diffusers 'fp16' revision

I’m loading the CompVis/stable-diffusion-v1-4 model using the Diffusers library like this:

pipeline = StableDiffusionPipeline.from_pretrained(model_name, revision="fp16", torch_dtype=torch.float16).to("cuda")

The memory footprint of the model on the GPU is something like 3.3GB. Then, I load the same model with the following code:

pipeline = StableDiffusionPipeline.from_pretrained(model_name).to("cuda")

As expected, the memory footprint is now double the size (~6.5GB). However, when I load the model using the code below, I still get a memory footprint of around 3.3GB and no errors in the image generation process:

pipeline = StableDiffusionPipeline.from_pretrained(model_name, torch_dtype=torch.float16).to("cuda")

My question is, why do we need to download the model from the fp16 revision?

1 Like

This model will use floats with half of the usual 32 bit precision, i.e 16bit float point precision. Thus it will fit in less GPU memory.

Yes, I know. They will also run a bit faster. My question is, why do I need to set the revision to fp16 and not just specify the torch_dtype attribute to be torch.float16, like I do in the last code snippet? What’s special with the fp16 revision?

This is the model you are downloading. If you do not specify fp16 it will download the model with precision fp32.

Sure, but if I then specify the torch_dtype attribute to be torch.float16 I’m overriding the default dtype of the model to be 16-bit precision floats. At least this is what the docs say:

torch_dtype (str or torch.dtype, optional) — Override the default torch.dtype and load the model under this dtype. If "auto" is passed the dtype will be automatically derived from the model’s weights.

If you do that you are still downloading a model that is double the size of what you needed. With models that can take several Gb this makes a difference.

3 Likes