Ah okay. Thanks!
precision as well as other issues like bleed, sequences, text rendering etc are some of the areas where gen AI struggles, some of those have to do with what we call system-2 processes (of course, things are improving all the time, I expected something like phenaki to appear way in the future and its already here!!)
in any case, consider that you are locating a point in latent space and decoding it to produce the final image. That point in latent space will include a table but also other things related to your prompts. So it’s pretty difficult to tightly control something without affecting other things unless you do something like inpainting or outpainting. So basically, you create a transparent image with your Study Table on it. And then perform inpainting or outpainting to generate more content around the table without affecting the table and yeah that would be a way to make it work.
There’s some noise in the image so i think i could have been stable diffusion
I do have them as far as I can tell (diffusers ‘0.4.1’ transformers ‘4.23.0’ and pytorch ‘1.12.1.post201’ and yet I am hitting
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same On the
Edit: I should add that I started from a fresh conda environment
I created a PR to suggest a
requirements.txt file that installs the dependencies you need to run the notebooks locally. You can find the PR here: Add requirements.txt file by dpoulopoulos · Pull Request #1 · fastai/diffusion-nbs · GitHub
I had the same issue for
StableDiffusionImg2ImgPipeline. So I installed the diffusers library from the latest commit and the issue was solved for me.
pip install git+https://github.com/huggingface/diffusers.git@e895952816dc4a9a04d1d82fb5f8bdee59341c06
(If you are working on colab, restart the runtime after you run the above command)
Hope this solves your issue too.
Update: Diffusers version
0.4.2 has the fix for this issue. Install it by using the command
pip install diffusers==0.4.2
I created a PR to suggest a
requirements.txtfile that installs the dependencies you need to run the notebooks locally. You can find the PR here: Add requirements.txt file by dpoulopoulos · Pull Request #1 · fastai/diffusion-nbs · GitHub
This was very helpful and came just in time as I was setting up here on a Windows machine. Thank you!
One thing to clarify for others: You’ll need to install fastai and jupyter (conda install jupyter) as well to run these as jupyter notebooks locally.
Yep, that did it. Thank you.
I don’t understand
pipe signature in:
num_rows,num_cols = 4,4 prompts = [prompt] * num_cols images = concat(pipe(prompts, guidance_scale=g).images for g in [1.1,3,7,14])
pipe accepts a list of prompts but only one guidance_scale? In documentation, I found how to create a
StableDiffusionPipeline but couldn’t find its
That will show you the source code. Generally fir bleeding edge stuff the docs won’t always have the latest info.
Lesson 9A was excellent @johnowhitaker - I took your colourful dance and added to my woodpecker picture - interesting transformations with the Image2Image section.
Thanks @brismith! What a trippy result
@dpoulopoulos’s requirements.txt looks great! Takes a while to get the right combinations of cuda/torch/etc.
If anyone is working with Conda, I’m using this environment.yml file to get the notebook from Lesson 9 working (I think conda manages dependencies a bit better than pip)
interesting thing to dive deeper is that for steps 1 or 2 the resulting image is black. from my limited understanding i was expecting all noise …
The ‘why predict the noise rather than the image’ discussion yesterday is one that is happening in the literature too. The ‘v objective’ here is getting popular as it seems to help stabilize training, and additional framings are being explored too. (This screenshot is from the ‘progressive distillation’ paper which Jeremy showed as the new work bringing down the required number of sampling steps - explainer video here if you want to hear me waffle through an attempted summary).
Give it a few days or weeks and it’s a safe bet the diffusers library will have something like a ‘model_prediction_type’ parameter for
xor … to handle these different parametrizations.
I was wondering if Jeremy kept the OneNote he used throughout the lecture, if so it might be nice to share it? It would act as a good aide memoir.
If I remember correctly the HF stable diffusion pipeline has a NSFW/Disturbing images filter on. If you used a prompt that some for any reason the filter thinks might be inappropriate, it will return a black image (in the original SD it would rickroll you ). Keep in mind that as any filter for these kind of things it can be overly aggressive so it is not alway obvious what is going on (e.g. last time I run into this it was filtering out anything related to “dead”, so a cat leaving a dead mouse on the door was blacked out)
i used prompt = “a photograph of an astronaut riding a horse”
steps 1 and 2 give black image, step 3 is noisy image…
Pretty sure that’s SFW Then I have no idea.
Safety checker in the diffusers library has a lot of False positives for NSFW detection.
You can disable the safety filter if you want.
pipe.safety_checker = lambda images, **kwargs: (images, False)