The new DeepSpeed MII library from Microsoft looks very interesting, not just for Stable Diffusion but for a bunch of other models they support too, in terms of speeding up tasks. The GitHub repo and details are here:
I haven’t been able to test Stable Diffusion speeds myself since I’d have to boot up my PC with NVIDIA card since their default code appears to be hardcoded for CUDA, but I’m first going to try if I an actually get MPS to work too.
They do provide example scripts and it looks as if you should be able to get Stable Diffusion working fairly easily as long as you have CUDA support. If you do try it and it works, I’d be interested to hear numbers in terms of speed gains while I try to make it work for MPS