Share your work here ✅ (Part 2 2022)

A version of this thread for Part 2 of the course.

Show us what you’ve made using the things you’re learning in the course! It could be a blog post, a jupyter notebook, a picture, a github repo, a web app, or anything else.

18 Likes

I guess I’ll start us out.

For the past 8 weeks I have been hosting a Diffusion Reading Group in the EleutherAI Discord server. I dig into many of the seminal diffusion model papers and literature (DDPM, DDIM, guided diffusion, etc.). All the recordings are available in a YouTube playlist and all the slides that I have presented are available at this GitHub repo. It is admittedly more math-intensive but I hope the paper walk-throughs are still useful. Some fastai folks have been attending too!

Feel free join in the EleutherAI Discord server under the reading groups channel every Saturday at 1pm PDT. The Diffusion Reading Group thread in the server is also a great place for general diffusion model discussion.

37 Likes

I was just watching the 9A lecture and tried mixing multiple arts (cyber punk with oil painting) and different prompts. I like how the background changes as you change the prompt from streets of New York to Tokyo.


16 Likes

Here is an example of morphing between two images. This is adapted from the deep dive notebook. Here we take a weighted average of two token embeddings (kitten and puppy). Start with all the weight on kitten and then progressively shift the weight to puppy and then reverse.

First image 50 inference steps.
morph224

Second image 100 inference steps.

kp192

12 Likes

Since Jeremy mentioned the perceptual loss in lecture 9, I thought I’d share this notebook where I reproduced the Johnson, Alahi, and Fei-Fei 2016 paper using fastai. I did this in 2020 and haven’t updated it since, so I can’t guarantee it will work out of the box, but I hope someone will find it helpful.

I did this as a part of a more significant project where I reproduced a few papers using nbdev and fastai. Of course, I called it fastpapers :smile: Now that the new version of nbdev uses quarto that supports so many new features (like references) I’m eager to update this library!

Here are some images from the style transfer notebook

9 Likes

Was playing around with the various resources Jeremy listed and wanted to compare the deep dive notebook to what Lexica results were like. Same prompt and guidance scale. Both made me laugh!

ezgif-3-1bc5537526

10 Likes

Thought I’d have an attempt at training a DDPM on my own dataset. Not the greatest result, but the fun is in the trying. :slight_smile:

I look forward to improving it as I understand more.

11 Likes

It is definitely on the right track! Especially, compared to the older approaches.

1 Like

I decided to see if I could make a picture of my daughters dog look like a unicorn. The initial dog is shown here. I used the img2img pipeline and the prompt “A unicorn”, then looked at different strength attributes. A very low strength (say 0.2) left the dog very much unchanged, wheras a high value of 0.7 lost the dog altogether. I found a nice result with a value of 0.3. Interestingly above this the head starts to get multiple horns! Interesting exercise, I can see already the need to learn to generate the right prompts and to tune the parameters. To explain further the values of strength in the rows are [0.2, 0.3, 0.4, 0.7]. I generated three images per prompt.


19 Likes

I tried the interpolation example notebook shared by @puru for dinosaur to chicken evolution.

dino_chicken_evolution

dino-chicken intermediate by the model :slight_smile:

Original post

19 Likes

I used the lesson 9 notebook to generate a Banksy sketch of a robot. I needed a cool image for the first page of a powerpoint deck I was creating for another course’s project presentation.

5 Likes

What was your daughter’s reaction?

1 Like

looks amazing! :grin:

1 Like

I think she prefers the dog, our granddaughter on the other had likes the unicorn, but then she is into my little pony😀! could have lots of fun with this

4 Likes

I tried to generate images using the same prompt, but in different languages. (All translated from English using DeepL.) The first one was this prompt.

A picture of a town hall in a historical quarter of a city

Seems like the model aligned texts written in a given language with the most common pictures from that area. Except for Greek, which collapsed. I guess the language isn’t well represented in the dataset (?). Also, an interesting interpretation of Estonian town hall. Too many castle pictures associated with this language?

The second one I tried is this.

A crowded street in a big city on a winter morning

Again, the Greek one is rather cumbersome. Is it a kind of “averaged” tourist’s selfie? Not sure if Estonian landscapes really have such kind of mountains… Also, English, German, and French are captured pretty accurately. The most frequent language/image pairs in the dataset?

14 Likes

I wrote up some of my thoughts / learnings from lesson 9 in a blog (including a glossary of terms). It’s not fully updated with things from 9a or 9b yet, but I’ll probably write subsequent blogs alongside lesson 10 this week.

18 Likes

I don’t know anything about Kamon designs but the results are not too shabby IMO.

Glad to see my notebook helped! :slight_smile:

1 Like

I created a video where I walk through using a devcontainer with fastai. I think this is a really nice way to create a stable environment. Fastai .devcontainer Environment Creation - YouTube

I also created a few images. This was my favorite (prompt was “m. c. escher cityscape van gogh” Special thanks to @hiromi for the prompt help!):

21 Likes

Fascinating. I too have been thinking about how well this would work for languages other than English.

The Greek images do suggest there may be biases in the training images!

Thanks for sharing!

1 Like

Great idea and thank you for sharing! I would also think that these images are the result of missing representation of other languages than English in the datasets. The question here is, which dataset matters? The one for CLIP training (not public) or for Stable Diffusion (LAION)? Both? I haven’t wrapped my head around these models yet :sweat_smile:

According to the model card, Stable Diffusion was trained on LAION-2B-en or subsets thereof, which consist of primarily English descriptions. So any other languages should be represented pretty poorly - I’m surprised by the decent results in German and French.

The black image for Greek in the first prompt is probably not a model failure, but got blocked by the overly aggressive NSFW filter, I suppose.

3 Likes