Token Merging - Faster, Less Memory Intensive Stable Diffusion?

An interesting paper released a couple of days ago seems to indicate that it might be possible to make Stable Diffusion faster and use less memory. They talk about an upcoming Stable Diffusion implementation (no code yet for Stable Diffusion) and how it can generate 3840x2176 images in under 2 minutes on an NVIDIA 4090 GPU!

Paper here: [2210.09461] Token Merging: Your ViT But Faster

GitHub repo here: GitHub - facebookresearch/ToMe: A method to increase the speed and lower the memory footprint of existing vision transformers.