Mixed precision training


#1

Has anyone been able to try Mixed precision training? I am using a v100 on GCP. When I start with fp16() it helps me speed up by epochs by ~25% but later on, I am getting a bunch of nans.

Is there something I am missing?


(RobG) #2

I’ve been getting this too, sometimes, on long run training. I thought it may be my new GPU (2080ti) and funky cuda/pytorch/driver combination.


#3

Do you just restart your kernel and start over when you get this or something else?I wonder is this due to some parameter setting I am missing .I am keeping all defaults as of now on fp16


(RobG) #4

Not sure what the issue is. I Have reported it before, but would be good to hear of others experience. Do you have any transforms() in your training? It could be something in a transform going awry.


#5

Yeah I do.What kind of benefit you are observing on using fp16() ?Is it close to ~25% that I reported?


(RobG) #6

yes about 25% on average


#7

Looks like we are on right path.Will be good to know what others are experiencing with mixed precesion training.


(Karl) #8

I don’t have much experience with fp16 training but I know one known issue is that gradients or other small values can become zero due to the lower numerical precision. One way around this is to scale up the loss value (ie by a factor of 1000), which by the chain rule also scales the gradients. This might be a solution to the problem you’re seeing.


#9

Thanks Karl.I might have to try that.


(Andrea de Luca) #10

The point of mixed precision training is addressing such shortcomings of pure FP16. The library should take care of this: the parts which are sensitive to truncation and/or rounding are handled in FP32.


(Andrea de Luca) #11

I’d like to see memory occupation about the same training cycle for FP32 vs. Mixed. Thanks.


(Michael) #12

I was also playing with the mixed precision training and was not observing some NaN losses so far.

There seems to be also a callback to stop the training when the loss is getting NaN: https://github.com/fastai/fastai_docs/blob/master/dev_nb/new_callbacks.ipynb (However, I didn’t tested it so far.)

Here is the super explanation from Sylvain: Mixed precision training

PS: Maybe also for your interest: learn.TTA(is_test=True) not supporting half precision models?


(Keyur Paralkar) #13

I am experiencing same issue with mixed precision training. The validation loss reaches nan at about 25-30% of training process. Will try to increase the loss_scale factor to 1000 and let this thread know it’s result.


(Keyur Paralkar) #14

Even after changing the loss_scale factor in to_fp16() to 1000.0 gives nan value more early around 3% of training. Is this problem specific to a particular kind of data?


(Keyur Paralkar) #15

Can this be related to vanishing gradient problem ?


#16

So has anyone actually managed to make to_fp16() work in the sense that it has shortened training?