After going through @ababino’s excellent set of questionnaires I decided to create a wiki post with the answers to the questions.
It’s currently incomplete; please feel free to edit it to add more answers.
Part 1:
- Lesson 9 is the continuation of the FastAI part 1 course which has 8 lessons.
- strmr.com offers a service to fine-tune diffusion models for subject generation based on DreamBooth.
- @ababino @muellerzr @ilovescience @init_27 and many more…
- Computing services:
- Lambda Labs
- Paperspace Gradient
- Jarvis Labs
- vast.ai
- It’s a repository containing notebooks to help you get started with stable diffusion.
- It contains notebooks and tools created by the AI art community to check out as a starting point.
-
stable_diffusion.ipynb
notebook uses the fantastic diffusers library by good folks at HuggingFace. - HuggingFace pipeline are end-to-end inference pipeline that allows you to get started with just a few lines of code.
- We use the
from_pretrained
to download the pre-trained weights. - Paperspace and Lambda Labs have persistent storage so no need to reinstall any libraries/dependencies every time you start your Notebook, as with Colab.
- We could
StableDiffusionPipeline
to produce images from a prompt. - We can set random seed using the torch method
torch.manual_seed(seed)
- We should set a random seed manually for the reproducibility of our results.
- Stable Diffusion is based on a progressive denoising algorithm; we start with pure random noise and remove some noise incrementally with each step to produce a convincing image.
- Currently the model doesn’t do a very good job with only a few denoising steps. The number of denoising steps is not fixed, but the model works well with more.
- The
image_grid
function takes a set of images and displays them in a grid. - The adherence of the generated images to the prompt increases as the value of the guidance scale increases.
- For each prompt, two images are created, one with the prompt and another one with no prompt (some random image), and then an average of both images is taken as dictated by the
guidance_scale
parameter. - Negative prompting refers to using another prompt to generate an image and subtracting it from the image generated by the original prompt.
- The Image2Image pipeline starts with a noisy version of an initial image instead of pure noise and gradually denoises it to match the prompt.
- The
strength
parameter specifies to what degree to follow the original image. - We could take the Image2Image pipeline’s output image and feed it back into the pipeline with a different prompt to produce even better images.
- The Stable Diffusion model is fine-tuned on a dataset of pokemon images with respective captions.
- “textual inversion” is the idea of creating a new token for a particular concept and fine-tuning a single embedding to refer to a particular concept using example images.
- “dreambooth” refers to taking an existing token that is rarely used and fine-tuning the model to associate that token with the examples images we provide.