Hi everyone. I’ve been studying diffusion models for the past couple of months and using a U-Net for the diffusion/generation process within that framework seems to be the done thing in the literature. However, the U-Nets used in SOTA diffusion models are huge, and training them from scratch is incredibly costly. This got me thinking if there’s a smart, fastai-y way to train a diffusion model based on a U-Net on a budget, similar to how using a pre-trained backbone and “NoGAN” training was utilised to train GANs in the course. Would transfer learning for the U-Net backbone make sense in a diffusion framework?
Your concern regarding the vast computational complexity of diffusion models is justified, and unfortunately cutting-edge papers published by top labs such as OpenAI are nearly impossible to replicate on lower-end machines, even on small datasets.
It depends on the task hand. For example, a pre-trained backbone would be beneficial for image-to-image diffusion models (e.g., Palette) because the network is being conditioned on images, and prior discriminative knowledge should intuitively help. Indeed, I’ve adopted this strategy previously in a few projects and obtained excellent results. However, for unconditional generation, I don’t see why pre-training might help since the neural net’s input is noisy, albeit I have not tested this hypothesis and might be wrong.
One technique that has somewhat abated diffusion models’ inefficiency for me is using the UNet introduced by Imagen, which is faster and occupies less memory. Additionally, in a resource-constrained environment, removing attention from the model is a prudent choice, although it unsurprisingly drops accuracy. Finally, if inference speed is an issue, there is a rich body of literature investigating faster sampling you could refer to.
Don’t hesitate to reach out if you have other questions.