Using SSIM as loss function rather than L1 or MSE | What could go wrong?

Hey folks,
I’m trying to understand whether I can use a new kind of Loss function.
My goal is to train a model that reconstructs details from dark images, based on those paired lighten-ones.
Previous researches used L1 and MSE loss functions.
I was thinking: Why not just use SSIM loss function? This function is also a metric that basically says how similar two different pictures are on to each other.
But according to Jeremy’s 4th lecture, the loss function needs to be one with a slope, so the model can learn to calculate good gradients. He basically says that loss function and metrics aren’t really the same.
What do you think?

For instance, this guy used a loss function of that kind for his model training:

When you train his model, the SSIM rises up from 0 to 0.95 very linearly (a matter of 100 epochs or so). But when I do the same, it rises up to 0.17 and just stays the same (with slight changes up and down). If it’s not linear, then may I conclude that something is wrong about the training?
Thanks