Deploying Mixed Precision Model

Hi all, I’ve trained and deployed an image segmentation model for EM images and am trying to figure out ways to optimize how much memory and speed it takes up. Currently, my model is about 130MB and it takes two seconds to make a prediction. One way to accomplish both reducing memory and increasing speed is to used mixed precision training which I was able to reduce the model to about 65MB. I was able to validate the accuracy by using:

learn.validate(data.train_dl.add_tfm(to_half))

However, when I try to use learn.predict on a single image (which I had converted to a HalfTensor), I get this error:

RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #3 'mat1' in 
call to _th_addmm_

I tried to extract the PyTorch model and run a forward pass of my image through it but I get this error:

RuntimeError: "unfolded2d_copy" not implemented for 'Half'

From general research, it seems that some modules in PyTorch don’t support float16 precision calculations but I’m a little confused because I was able to validate my model earlier using (presumably) float16 test images in a DataLoader so why is there a problem when I try to run a prediction on a single float16 test image?

Are you sure you are using the GPU? Usually those operations are not implemented on the CPU, but as you pointed out, they are implemented on the GPU.
If you need to switch back your Learner back to full precision, you have the to_fp32 command: learn.to_fp32().

1 Like

Ah thank you, that’s where the problem lies. I use a cloud GPU when training and validating but for deployment purposes, I try to keep everything limited to the CPU (as I assumed a GPU is only really needed for training purposes). That makes sense then since my validation was done on GPU but single image prediction is done on CPU. I’m currently deploying my model on an app hosted on Heroku so I guess using a mixed precision model would not be a viable solution for reducing memory and run-time then unfortunately.

Hi @sgugger,
Is there any plan support these operator on the CPU.
I normally deploy on CPU so really need that feature.
Thanks!

Mixed Precision is only supported on GPU’s, not CPU’s (this is a hardware not a software thing)

https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

1 Like

I got it. Thanks for useful information

solved my problem!!