Work together on papers - eDiffi

KevinB · November 3, 2022, 2:29pm

I saw a new paper yesterday when looking at mlfeed.tech and it seems like it is worth exploring. [2211.01324] eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers (arxiv.org)

If anybody else is interested, feel free to share questions or insights on here. I am planning on implementing this and will post here as I figure things out!

Fahim · November 3, 2022, 3:03pm

I saw that too and it looked really interesting, but I wasn’t sure how to even get started on that one since it talks about using multiple text encoders and mixing their output to guide the diffuser or something like that?

But if you have an idea as to how to get started, I’d be interested to participate …

KevinB · November 3, 2022, 8:44pm

Well based on this, I don’t think it is going to be possible to replicate 100%, but this kind of feels like overkill:

I’m still going to try implementing a version of this, but not sure what it will look like yet. The massive amount of hardware really doesn’t seem to be the important concept coming out of this paper though.

johnrobinsn · November 3, 2022, 8:45pm

Impressive!

Fahim · November 4, 2022, 12:33am

Ah, I didn’t notice that part at all I generally skim through the papers to see if I can implement something and if it doesn’t make a lot of sense to me as to how I could get started, then I stop …

Something with similarly good output, but possibly easier to implement was UPainting. I considered starting a work together for that one but again, didn’t know how to get started because the addition there is an image-text matching component and I wasn’t sure what that looked like …

Raymond-Wu · November 4, 2022, 6:28am

I posted about this in discord the other day. I’m new to implementing diffusion models but I’d be down to try!

KevinB · November 5, 2022, 10:36pm

One thing that seems important to properly implement this paper is to understand this other paper: [2205.11487] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (arxiv.org) I had heard of this paper before called “Imagen”. I’m going to step back on eDiffi and try implementing Imagen instead before I come back to this!

Fahim · November 6, 2022, 12:58am

I believe there’s an implementation of Imagen around … If I recall correctly, by LucidRains(?) I’m going off of memory here. You have to train your own data for that though and that was something I tried to get going on Apple MPS way back when … Let me see if I can find the repo, it might be helpful for your research?

Found it … Here you go:

Fahim · November 7, 2022, 1:36am

Also, here’s an implementation of part of eDiffi:

Raymond-Wu · November 11, 2022, 1:41am

There’s a new version of the paper: https://arxiv.org/pdf/2211.01324.pdf. New website and new video as well.

https://deepimagination.cc/eDiff-I/