We have a way to cut down the training time by half or more for small batches if we use lazy metrics.Have a look at the code for DDPM. It should work here as well. ( I will post a notebook that include the changes along with my experiments resnet18d.)