Stable Diffusion v2

Everyone at Stability AI happy to share version 2 release of Stable Diffusion!
Comes with new:

  • text-to-image models
  • upscaler model
  • depth-to-image model
  • inpainting model

Learn more:


Well there goes today’s approx. 5 hours of finetuning v1.5 :sweat_smile:. Looking forward to trying the new models after Thanksgiving!


Depth2img is really impressive and clever I think!

I believe they actually used SDEdit under the hood which is cool to see…

I tested out text2img with SD v2 and noticed prompting is very different from before, probably due to OpenClip, will be interesting to see how users explore the new possibilities!


The HF diffusers library has been updated to work with SD2.0 with version 0.9.0. You can find the link to the documentation here: Stable diffusion 2 and the repo w/ examples here: GitHub - huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

I’m excited to try this out! Has anyone figured out how to implement the safety checker for the 2.0 model? It seems to be ‘off’ by default and I have not been able to find any documentation on how to add it back? I do see there is now a Safe Stable Diffusion pipeline but it looks like it is based on a different base model. I have a Twitter bot that produces Stable Diffusion images and I feel it’s important to utilize the safety checker model for that.


I thought that v2 is inherently SFW for legal reasons and that you would need to explicitly finetune the model on new data to change that?

I saw in the blog post that they used a LAION’s NSFW filter on the training set which probably helps, but I have not seen that they do not recommended to use one. In fact the diffusers library issues a warning if you don’t use a safety checker. The language is a bit confusing as they say that ‘you have disabled it’, but the default for the library is to not include one and it wasn’t trivial to figure out how to add one to the SD2.0 model.

I did some testing and it is trivial to produce NSFW content. EX:

I was able to figure out how to get the safety checker from the SD 1.5 model. It feels a bit hacky, but I’m happy to have it for my Twitter bot.

from diffusers.pipelines.stable_diffusion.safety_checker import StableDiffusionSafetyChecker
from transformers import CLIPFeatureExtractor
safety_checker = StableDiffusionSafetyChecker.from_pretrained("runwayml/stable-diffusion-v1-5",subfolder='safety_checker').half()
#Feature Extractor doesn't load properly like this
# safety_feature_extractor = CLIPFeatureExtractor.from_pretrained("runwayml/stable-diffusion-v1-5",subfolder='feature_extractor')
#I needed to manually point to the preprocessor_config.json file.
safety_feature_extractor = CLIPFeatureExtractor.from_json_file('/root/.cache/huggingface/diffusers/models--runwayml--stable-diffusion-v1-5/snapshots/3beed0bcb34a3d281ce27bd8a6a1efbb68eada38/feature_extractor/preprocessor_config.json')

pipe = StableDiffusionPipeline.from_pretrained(
1 Like

Good to know. I’m still away from my desktop for Thanksgiving and have not had a chance to test the new base models yet.

Edit: Good to know as in I’m glad I know I still need to be careful about public facing models, not good to know that attempts to make the base models SFW were not fully successful

1 Like

Hopefully, it’s not much more involved to finetune the depth2img model versus the regular text2img one because I especially want to try that.

Base image:

Prompt: “tiny cute 3D felt fiber woman on flower field, made from Felt fibers, a 3D render, trending on cgsociety, rendered in maya, rendered in cinema4d, made of yarn”



For those like me who would like to customize the sampling loop of SD2, I’ve made a “boilerplate” notebook.

As an example, I’ve tested the update with rescaled norm discussed on this forum at the time of v1, and it still seems to improve the results.