Mixed Precision fp16() model size vs fp32()

I am trying to deploy an unet image generation model in production and I felt that the inference time was slow. So I tried to retrain the model using fp16() so that it would make the model less complex, there by reducing the model file size and the inference time.

But after training the model using fp16(), I still the same model file is of the same size of the normal variant? Is it expected?

1 Like