Lesson 11 official topic

I tried this (CPU only again) with the matrix multiplication for the 10.000 validation images - for me, there is no difference between the first and the following Numba runs, but a large difference to the non-compiled version.

1 Like

Fascinating. Thank you for sharing those. No idea how to explain the discrepancy except maybe just different hardware.

For anyone interested in semantic in-painting ClipSeg + the stable diffusion in-painting model shared by @johnowhitaker earlier today work pretty well (for basic cases at least).
ClipSeg automatically creates the segmentation mask from a src image and a prompt (e.g. horse).
Then you simply pass the src image, mask image and insertion prompt (e.g. zebra) to the in-painting model.

Here are some examples:

Here’s a notebook if you want to try it out.


Thanks @johnowhitaker for pointing to the stable diffusion in painting pipeline and @tommyc for sharing your code.

I tried two approaches -

  1. Approach 1 - So, I built on my last shared notebook, I improved masking by adding a OpenCV trick and then generated a new image using img2img pipeline. Then I mixed original image with the new image using masking. Like below -

  2. Approach 2 - Notebook. I used the masked generated by DiffEdit+OpenCV and then used inpaint pipeline.

Here are comparitive result. I think the in-paint pipeline results looks better to me.


Thanks for sharing the code:
orig_noisy-target_noisy gives the following image:

and swapping that around to - target_noisy-orig_noisy yields:

We see they are kinda complimentary in the “horse”
so i took a max - np.maximum(mask1.mean(axis=-1),mask2.mean(axis=-1)) yields a better mask


You’re welcome, John :slight_smile: And if you’re interested, there’s a separate thread where we (at least try to) collaborate on the DiffEdit paper. We are hoping to collaborate on future papers too so that everybody can learn from each others’ efforts …

1 Like

That looks good! I got distracted from DiffEdit by switching over to the 1.5 model but I hope to get back to that today and I was going to do exactly what you did for approach 1 till @johnowhitaker mentioned the inpainting pipeline yesterday and I was like “Doh, how could I have forgotten that?” since I spent a lot of time creating a method to do easy masking for a GUI I did so that the image can be used with the inpainting pipeline — I guess tunnel-vision can really get you :stuck_out_tongue:

But I do want to try out both approaches. If I do get to it, will post results.

1 Like

Hey everyone!! Been staying off this forum for a few days to make sure some kickass implementation doesn’t discourage me from making my own implementation of the DiffEdit paper :sweat_smile:
I’ve written up a Jupyter notebook that documents my attempts at implementing step one - making a mask based on the query text (I’ll allow you to be the judge!). I thought I’d put it on GitHub and render it on nbviewer, but it’s 93MB in size and GitHub has a 25MB size limit. If anyone has any ideas on how I can show my work without having to cut it down, please let me know :slight_smile:


GitLFS? If it’s just hitting the limit around large files, that’d be one way. Or can you just host the file somewhere in cloud storage and have people download it to their own machines?
What is the part that is taking up all that space? images?

1 Like

I believe @johnrobinsn uploaded a PDF version of his notebook too since his notebook file was too large and GitHub wouldn’t display it. So that might be one way to go? Also, since I assume the only issue is that you can’t open the notebook from GitHub directly, given that people can still download the file and open it locally (and/or on Colab) it shouldn’t be an issue, right?

Yes I could host it on the cloud but I wished for it to be easily readable on a webpage (aka, blog-like). And yes it’s not a ton of code, it’s the image outputs that have made the file so big. Right now I’m trying to use nbdev to convert it into a GitHub Pages blogpost…I think that should be possible. If not, I’ll check out GitLFS or any other cloud storage option. Thanks for your suggestions :slight_smile:

1 Like

A PDF version is one way to go, yep! The issue isn’t that GitHub doesn’t render the notebook; it’s that I can’t even upload the notebook to my repo, because there seems to be a 25MB upload size limit :sweat_smile:@Fahim

You probably don’t even need nbdev - just use quarto.

1 Like

In the video about 59:33 in we are multiplying matrices m1 and m2. m1 has the shape of 5x784, this makes sense to me because we are grabbing the first 5 images from the mnist set. What I don’t understand is why we made m2 (the weights) 784x10. Why wouldn’t we have just done 784x5 for m2?

If this was asked already or I missed it in the video, I apologize. I couldn’t find anything searching through the forums.

There are 10 possible outputs - i.e. we need 10 probabilities returned, one for each digit.

Thanks for the quick reply. I just noticed that in the video looking over it again at about 1:01:01. :man_facepalming: Thanks for your time, Jeremy.

1 Like

Hi everyone - This is my first time posting. Please let me share something I have been working on.
I would appreciate any feedback:

Noise diff for classification interpretability

I was curious as to whether diffusion models could provide better human interpretable explanations for classification results than existing methods such as Gradcam or SHAP.
Using noise prediction differences similar to those in DiffEdit, we can get some interesting results.

Problem: assume I have trained a horse/zebra classifier and want to understand why a given image is being classified as a zebra (and not a horse). This method attempts to highlight important parts of the input image

Input image:

Output image:

[heatmap version, sum of noise from steps 10-20]

As expected, the stripes seem to be the most important feature for explaining the prediction.

Rat vs Mouse:



[Non-heatmap version sum of noise from steps 30-40]

Ear, nose and eyes appear to be important, the main part of the body is not.

Sports car vs taxi



[Heatmap version sum of noise from steps 20-30]

The spoiler on the back of the car seems to have been highlighted as important, as well as lights and possibly wheel trims.
Note the “negative” highlighted box on the top of the car. Could this be indicating that the lack of the TAXI “roof light” is a reason why the image is not a taxi?
Indeed, if we look at the comparison with the empty/unguided prompt (see 2. In Method below), this box disappears:

[Sports car vs “empty prompt”]


  1. Take input image and add noise to the image according to diffusion model schedule (resulting in an array of e.g. 50 noised images)

  2. For the labels to be compared, perform noise prediction on each of the noised images from 1. using the classifier labels as prompts (e.g. “zebra” and “horse” prompts). Alternatively, use the target label and the empty prompt.

  3. Keep track of the noise predictions for each noise image (but do not apply the noise predictions to the images themselves as in normal diffusion process). The end result is two arrays of noise predictions for the two prompts.

  4. Create the noise difference array by subtracting the noise predictions from 3. Sum the values of this array for different ranges (e.g. 0-50 or 10-20).

  5. Visualise the noise differences

Still to do:
Instead of using the Stable Diffusion pretrained model, train a diffusion model on the specific dataset that the classifier was trained on. Compare results with other methods (e.g. Gradcam, SHAP).

Notebook available here

Heavily based on the lesson 9 Stable Diffusion Deep Dive notebook


That’s rather weird … I mean the 25MB upload limit since there are repos with files which are several gigs in size … I have a notebook which is 32MB on a repo, like this one:

I don’t know why you’re having issues with a 25MB upload, but possibly something to try (if this is the initial commit/push) would be to start with a small file and then replace that with the 25MB+ file and see if that works?

1 Like

@charlie, the results you shared on implementing DiffEdit paper on twitter are amazing. I would love to see what your masks looked like and how you achieved that.

I stuck to using the LMSDiscreteScheduler and only things introduced in the lesson 9 deep dive notebook. I am having the most trouble getting the mask to be good. In all my masks there are patches missing. I am going to try some other tricks mentioned here by others too. I am already using some of them.

Here is another example to show that my masks are not great:


I managed to get the masks to be slightly better using some of the methods here but then they became a bit too broad/not so well defined. Jeremy suggested that OpenCV might be the way to go and I noticed that a few others had used OpenCV too. Possibly some research in that direction might help? That’s what I intend to do today if I can find the time …

1 Like